2601.22663v1 Jan 30, 2026 cs.CV

비지도 합성 이미지 출처 식별: 정렬 및 분리

Unsupervised Synthetic Image Attribution: Alignment and Disentanglement

Tongliang Liu

Citations: 107

h-index: 7

Zongfang Liu

Citations: 12

h-index: 2

Guan-Hong Chen

Citations: 509

h-index: 15

Boyang Sun

Citations: 67

h-index: 2

Kun Zhang

Citations: 5

h-index: 1

합성 이미지의 품질이 향상됨에 따라, 모델이 생성한 이미지의 기반 개념을 파악하는 것은 저작권 보호 및 모델 투명성 확보에 점점 더 중요해지고 있습니다. 기존 방법들은 주석이 달린 합성 이미지 쌍과 해당 이미지의 원본 학습 자료를 사용하여 모델을 학습함으로써 이러한 출처 식별 목표를 달성합니다. 그러나 이러한 쌍으로 이루어진 학습 데이터 획득은 어렵습니다. 왜냐하면 이는 잘 설계된 합성 개념 또는 수백만 개의 학습 자료로부터의 정확한 주석이 필요하기 때문입니다. 본 논문에서는 이러한 비용이 많이 드는 쌍으로 이루어진 주석의 필요성을 없애기 위해, 비지도 합성 이미지 출처 식별의 가능성을 탐구합니다. 우리는 Alignment and Disentanglement이라는 간단하면서도 효과적인 비지도 방법을 제안합니다. 구체적으로, 우리는 먼저 대조적인 자기 지도 학습을 사용하여 기본적인 개념 정렬을 수행합니다. 그 다음, 우리는 Infomax 손실을 통해 표현 분리를 촉진하여 모델의 출처 식별 능력을 향상시킵니다. 이러한 접근 방식은 흥미로운 관찰에서 영감을 받았습니다. 즉, MoCo 및 DINO와 같은 대조적인 자기 지도 학습 모델은 본질적으로 간단한 교차 도메인 정렬 능력을 가지고 있습니다. 우리는 이러한 관찰을 교차 공분산에 대한 이론적 가정으로 공식화하여, 정렬과 분리가 정규 상관 분석 목표를 분해함으로써 개념 매칭 프로세스를 어떻게 근사할 수 있는지에 대한 이론적 설명을 제공합니다. 실제 벤치마크인 AbC에서, 우리의 비지도 방법이 놀랍게도 지도 학습 방법을 능가하는 성능을 보임을 보여줍니다. 우리는 본 논문의 직관적인 통찰력과 실험 결과를 통해 이 어려운 문제에 대한 새로운 관점을 제공할 수 있기를 기대합니다.

Original Abstract

As the quality of synthetic images improves, identifying the underlying concepts of model-generated images is becoming increasingly crucial for copyright protection and ensuring model transparency. Existing methods achieve this attribution goal by training models using annotated pairs of synthetic images and their original training sources. However, obtaining such paired supervision is challenging, as it requires either well-designed synthetic concepts or precise annotations from millions of training sources. To eliminate the need for costly paired annotations, in this paper, we explore the possibility of unsupervised synthetic image attribution. We propose a simple yet effective unsupervised method called Alignment and Disentanglement. Specifically, we begin by performing basic concept alignment using contrastive self-supervised learning. Next, we enhance the model's attribution ability by promoting representation disentanglement with the Infomax loss. This approach is motivated by an interesting observation: contrastive self-supervised models, such as MoCo and DINO, inherently exhibit the ability to perform simple cross-domain alignment. By formulating this observation as a theoretical assumption on cross-covariance, we provide a theoretical explanation of how alignment and disentanglement can approximate the concept-matching process through a decomposition of the canonical correlation analysis objective. On the real-world benchmarks, AbC, we show that our unsupervised method surprisingly outperforms the supervised methods. As a starting point, we expect our intuitive insights and experimental findings to provide a fresh perspective on this challenging task.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!