2603.05235v1 Mar 05, 2026 cs.AI

소스 데이터 없는 교차 도메인 소량 학습을 위한 잃어버린 텍스트 계층 복원

Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning

Yixiong Zou

Citations: 285

h-index: 9

Yuhua Li

Citations: 248

h-index: 9

Ruixuan Li

Citations: 339

h-index: 11

Zhenyu Zhang

Citations: 75

h-index: 4

Guangyao Chen

Cornell University

Citations: 1,899

h-index: 11

소스 데이터 없는 교차 도메인 소량 학습(SF-CDFSL)은 의료 영상이나 위성 이미지와 같은 대상 도메인에서 제한된 학습 데이터만을 사용하여 모델을 미세 조정하는 것을 목표로 하며, CLIP은 이러한 작업에 대한 일반화 성능으로 인해 최근 유망한 결과를 보여주고 있습니다. 기존 연구에서는 CLIP의 텍스트 인코더가 교차 도메인 작업에 더 적합하다고 보고되었지만, 본 연구에서는 특정 중간 텍스트 인코더 계층을 제거하면 SF-CDFSL 성능을 효과적으로 향상시킬 수 있으며, 이를 '잃어버린 계층(Lost Layers)'이라고 명명했습니다. 본 논문에서는 이 현상에 대해 심층적인 이해를 제공하기 위해 분석을 진행했습니다. 연구 결과, 해당 계층에 포함된 정보는 SF-CDFSL 작업에 해로운 것이 아니라 오히려 유익하며, 시각적 간극으로 인해 이러한 유용한 정보가 완전히 활용되지 못하여 불필요하게 보이는 것입니다. 이러한 이해를 바탕으로, 기존 연구에서 단순히 해당 계층을 제거하는 방식과는 달리, 모델이 잃어버린 계층에 포함된 정보를 계층 및 인코더 수준에서 **재활용**하도록 하는 방법을 제안합니다. 이 방법은 도메인 변화에 따른 시각적 분기(visual branch)의 재학습을 유도합니다. 제안하는 방법은 텍스트 인코더 내의 활용되지 않은 정보 문제를 효과적으로 해결합니다. 다양한 설정, 백본(CLIP, SigLip, PE-Core), 그리고 작업(4개의 CDFSL 데이터셋 및 10개의 Meta-dataset 데이터셋)에 대한 광범위한 실험을 통해 제안하는 방법의 효과를 입증했습니다. 코드 및 관련 자료는 https://github.com/zhenyuZ-HUST/CVPR26-VtT 에서 확인할 수 있습니다.

Original Abstract

Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) focuses on fine-tuning with limited training data from target domains (e.g., medical or satellite images), where CLIP has recently shown promising results due to its generalizability to downstream tasks. Current works indicate CLIP's text encoder is more suitable for cross-domain tasks, however, we find that \textbf{removing certain middle layers of the text encoder can effectively improve performance in SF-CDFSL}, which we call the Lost Layers. In this paper, we delve into this phenomenon for a deeper understanding. We discover that instead of being harmful for the SF-CDFSL task, the information in these layers is actually beneficial, but visual gaps prevent this useful information from being fully utilized, making these layers seem redundant. Based on this understanding, unlike current works that simply remove these layers, we propose a method to teachs the model to \textbf{re-utilize} information in these lost layers at both the layer and encoder levels, guiding the re-learning of the visual branch under domain shifts. Our approach effectively addresses the issue of underutilized information in the text encoder. Extensive experiments across various settings, backbones (CLIP, SigLip, PE-Core), and tasks (4 CDFSL datasets and 10 Meta-dataset datasets) demonstrate the effectiveness of our method. Code is available at https://github.com/zhenyuZ-HUST/CVPR26-VtT.

3 Citations

0 Influential

30.993061443341 Altmetric

158.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!