2603.17474v1 Mar 18, 2026 cs.CV

크로스 어텐션 메커니즘 재검토: 도메인 적응 학습을 위한 유용한 노이즈 활용

Revisiting Cross-Attention Mechanisms: Leveraging Beneficial Noise for Domain-Adaptive Learning

Yehui Yang

Citations: 85

h-index: 5

Zelin Zang

Citations: 550

h-index: 14

Liang Li

Citations: 738

h-index: 6

Baigui Sun

Citations: 2,110

h-index: 21

Fei Wang

Citations: 1

h-index: 1

비지도 도메인 적응(UDA)은 레이블이 있는 소스 도메인의 지식을 레이블이 없는 대상 도메인으로 전송하는 것을 목표으로 하지만, 종종 심각한 도메인 및 스케일 차이로 인해 성능 저하가 발생합니다. 기존의 크로스 어텐션 기반 트랜스포머는 도메인 간 특징을 정렬할 수 있지만, 큰 외관 및 스케일 변화 하에서 콘텐츠 의미를 유지하는 데 어려움을 겪습니다. 이러한 문제를 명시적으로 해결하기 위해, 우리는 '유용한 노이즈'라는 개념을 도입하여, 제어된 섭동을 주입하여 크로스 어텐션을 정규화하고, 모델이 스타일적인 방해 요소를 무시하고 콘텐츠에 집중하도록 유도합니다. 우리는 도메인 적응 크로스-스케일 매칭(DACSM) 프레임워크를 제안합니다. DACSM은 도메인-공유 콘텐츠와 도메인-특정 스타일을 분리하는 도메인 적응 트랜스포머(DAT)와, 여러 해상도에서 특징을 적응적으로 정렬하는 크로스-스케일 매칭(CSM) 모듈로 구성됩니다. DAT는 크로스 어텐션에 유용한 노이즈를 통합하여 점진적인 도메인 변환을 가능하게 하고, 향상된 강건성을 제공하며, 콘텐츠 일관성을 유지하고 스타일 불변적인 표현을 생성합니다. 동시에, CSM은 스케일 변화 하에서 의미적 일관성을 보장합니다. VisDA-2017, Office-Home, 및 DomainNet 데이터 세트에 대한 광범위한 실험 결과, DACSM은 CDTrans보다 최대 +2.3%의 성능 향상을 달성하여 최첨단 성능을 보여줍니다. 특히, DACSM은 VisDA의 어려운 '트럭' 클래스에서 +5.9%의 성능 향상을 보여주어, 스케일 불일치를 처리하는 데 유용한 노이즈의 효과를 입증합니다. 이러한 결과는 도메인 변환, 유용한 노이즈 강화된 어텐션, 그리고 스케일 인지 정렬을 결합하여 강력한 교차 도메인 표현 학습을 달성하는 데 효과적임을 강조합니다.

Original Abstract

Unsupervised Domain Adaptation (UDA) seeks to transfer knowledge from a labeled source domain to an unlabeled target domain but often suffers from severe domain and scale gaps that degrade performance. Existing cross-attention-based transformers can align features across domains, yet they struggle to preserve content semantics under large appearance and scale variations. To explicitly address these challenges, we introduce the concept of beneficial noise, which regularizes cross-attention by injecting controlled perturbations, encouraging the model to ignore style distractions and focus on content. We propose the Domain-Adaptive Cross-Scale Matching (DACSM) framework, which consists of a Domain-Adaptive Transformer (DAT) for disentangling domain-shared content from domain-specific style, and a Cross-Scale Matching (CSM) module that adaptively aligns features across multiple resolutions. DAT incorporates beneficial noise into cross-attention, enabling progressive domain translation with enhanced robustness, yielding content-consistent and style-invariant representations. Meanwhile, CSM ensures semantic consistency under scale changes. Extensive experiments on VisDA-2017, Office-Home, and DomainNet demonstrate that DACSM achieves state-of-the-art performance, with up to +2.3% improvement over CDTrans on VisDA-2017. Notably, DACSM achieves a +5.9% gain on the challenging "truck" class of VisDA, evidencing the strength of beneficial noise in handling scale discrepancies. These results highlight the effectiveness of combining domain translation, beneficial-noise-enhanced attention, and scale-aware alignment for robust cross-domain representation learning.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!