2601.01224v2 Jan 03, 2026 cs.CV

레지스터 및 대비 정렬을 활용한 객체 중심 디퓨전 학습의 성능 향상

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

Stefano Ermon

Citations: 89,213

h-index: 102

Yuhta Takida

Citations: 1,365

h-index: 16

N. Murata

Citations: 1,409

h-index: 17

Yuki Mitsufuji

Citations: 1,716

h-index: 16

Bac Nguyen

Citations: 21

h-index: 3

Chieh-Hsin Lai

Citations: 1,670

h-index: 17

Toshimitsu Uesaka

Citations: 1,095

h-index: 13

최근 사전 학습된 디퓨전 모델과 함께 사용되는 슬롯 어텐션(Slot Attention, SA)은 객체 중심 학습(Object-Centric Learning, OCL)에서 유망한 결과를 보여주었지만, 슬롯 간의 상호 간섭 및 객체 슬롯과 이미지 콘텐츠 간의 약한 정렬 문제를 안고 있습니다. 본 연구에서는 이러한 문제를 해결하기 위해 대비 객체 중심 디퓨전 정렬(Contrastive Object-centric Diffusion Alignment, CODA)이라는 간단한 방법을 제안합니다. CODA는 (i) 잔여 어텐션을 흡수하고 객체 슬롯 간의 간섭을 줄이기 위해 레지스터 슬롯을 사용하고, (ii) 슬롯과 이미지 간의 대응 관계를 명시적으로 장려하기 위해 대비 정렬 손실을 적용합니다. 이러한 학습 목표는 슬롯과 입력 데이터 간의 상호 정보량(Mutual Information, MI)을 최대화하는 데 효과적인 대리 목표 역할을 하며, 슬롯 표현의 품질을 향상시킵니다. 합성 데이터셋(MOVi-C/E)과 실제 데이터셋(VOC, COCO) 모두에서 CODA는 객체 발견(예: COCO 데이터셋에서 FG-ARI 6.1% 향상), 속성 예측 및 복합 이미지 생성 성능을 기존의 강력한 방법보다 향상시킵니다. 레지스터 슬롯은 무시할 만한 오버헤드를 추가하며, CODA를 효율적이고 확장 가능하게 유지합니다. 이러한 결과는 CODA가 복잡하고 실제적인 환경에서 강력한 OCL을 위한 효과적인 프레임워크로 활용될 수 있음을 시사합니다. 코드 및 사전 학습된 모델은 다음 링크에서 확인할 수 있습니다: https://github.com/sony/coda.

Original Abstract

Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) employs register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot-image correspondence. The resulting training objective serves as a tractable surrogate for maximizing mutual information (MI) between slots and inputs, strengthening slot representation quality. On both synthetic (MOVi-C/E) and real-world datasets (VOC, COCO), CODA improves object discovery (e.g., +6.1% FG-ARI on COCO), property prediction, and compositional image generation over strong baselines. Register slots add negligible overhead, keeping CODA efficient and scalable. These results indicate potential applications of CODA as an effective framework for robust OCL in complex, real-world scenes. Code and pretrained models are available at https://github.com/sony/coda.

0 Citations

0 Influential

53.4657359028 Altmetric

267.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!