2602.01346v1 Feb 01, 2026 cs.AI

레이어 컨덕턴스를 활용한 비전 언어 모델 선택을 위한 모델별 작업 유사도

Model Specific Task Similarity for Vision Language Model Selection via Layer Conductance

Wei Yang

Citations: 267

h-index: 7

Hong Xie

Citations: 5

h-index: 1

Tao Tan

Citations: 10

h-index: 2

Xin Li

Citations: 0

h-index: 0

Enhong Chen

Citations: 1,797

h-index: 20

Defu Lian

Citations: 13,393

h-index: 51

오픈 소스 비전-언어 모델(VLM)이 급증하고 있지만, 특정 다운스트림 작업에 가장 적합한 사전 학습 모델을 선택하는 것은 여전히 어려운 과제이다. 퓨샷(few-shot) 시나리오에서의 데이터 부족과 계산상의 제약으로 인해 모든 모델을 포괄적으로 평가하는 것은 현실적으로 불가능한 경우가 많다. 기존의 선택 방법들은 데이터 집약적인 프록시에 의존하거나, 전이성(transferability)의 본질적인 방향성과 모델 특이적 성질을 간과하는 대칭적 텍스트 설명자를 사용함으로써 이 문제를 완전히 해결하지 못하고 있다. 이러한 문제를 해결하기 위해, 본 논문에서는 비전 인코더의 내부 기능적 역학(dynamics)에 기반을 둔 모델 선택 프레임워크를 제안한다. 우리의 접근 방식은 각 작업을 레이어별 컨덕턴스(layer-wise conductance)로 표현하고, 엔트로피 정규화 정렬을 통해 타겟 조건부 블록 중요도 분포를 도출한다. 이를 바탕으로 소스 작업이 타겟의 핵심 기능 블록을 얼마나 효과적으로 커버하는지를 정량화하는 비대칭 지표인 '방향성 컨덕턴스 발산(Directional Conductance Divergence, DCD)'을 도입한다. 이를 통해 직접적인 추론 과정 없이 소스 작업의 순위를 집계하여 타겟 모델의 순위를 예측할 수 있다. 21개 데이터셋과 48개 VLM을 대상으로 한 실험 결과, 제안한 방법이 최신 베이스라인 모델들을 능가하였으며, SWAB 대비 NDCG@5 성능이 14.7% 향상되었음을 입증하였다.

Original Abstract

While open sourced Vision-Language Models (VLMs) have proliferated, selecting the optimal pretrained model for a specific downstream task remains challenging. Exhaustive evaluation is often infeasible due to computational constraints and data limitations in few shot scenarios. Existing selection methods fail to fully address this: they either rely on data-intensive proxies or use symmetric textual descriptors that neglect the inherently directional and model-specific nature of transferability. To address this problem, we propose a framework that grounds model selection in the internal functional dynamics of the visual encoder. Our approach represents each task via layer wise conductance and derives a target-conditioned block importance distribution through entropy regularized alignment. Building on this, we introduce Directional Conductance Divergence (DCD), an asymmetric metric that quantifies how effectively a source task covers the target's salient functional blocks. This allows for predicting target model rankings by aggregating source task ranks without direct inference. Experimental results on 48 VLMs across 21 datasets demonstrate that our method outperforms state-of-the-art baselines, achieving a 14.7% improvement in NDCG@5 over SWAB.

0 Citations

0 Influential

25.5 Altmetric

127.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!