2604.13552v1 Apr 15, 2026 cs.CL

대규모 언어 모델을 위한 학습 불필요 테스트 시간 대비 학습

Training-Free Test-Time Contrastive Learning for Large Language Models

Jinwu Hu

Citations: 64

h-index: 5

Kaiwen Zheng

Citations: 29

h-index: 4

Te Gu

Citations: 0

h-index: 0

Min Peng

Citations: 68

h-index: 4

Fei Liu

Citations: 35

h-index: 2

Kaiwen Zhou

Citations: 1

h-index: 1

대규모 언어 모델(LLM)은 뛰어난 추론 능력을 보여주지만, 데이터 분포 변화에 따라 성능이 저하되는 경우가 많습니다. 기존의 테스트 시간 적응(TTA) 방법은 그래디언트 기반 업데이트를 필요로 하며, 이는 모델 내부 정보 접근이 필요하고 상당한 오버헤드를 발생시킵니다. 반면, 학습이 필요 없는 대안들은 정적이며, 외부 지침에 의존하는 경우가 많습니다. 본 논문에서는 학습이 필요 없는 테스트 시간 대비 학습 프레임워크인 TF-TTCL(Training-Free Test-Time Contrastive Learning)을 제안합니다. TF-TTCL은 동결된 LLM이 자체 추론 경험으로부터 감독 신호를 추출하여 온라인으로 성능을 향상시킬 수 있도록 합니다. 구체적으로, TF-TTCL은 세 가지 핵심 모듈을 통해 동적인 "탐색-반성-조향" 루프를 구현합니다. 1) 의미 기반 쿼리 증강은 다중 에이전트 역할극을 통해 다양한 문제 시점을 다양화하여 서로 다른 추론 경로를 생성합니다. 2) 대비 학습을 통한 경험 증류는 우수한 경로와 열등한 경로 간의 의미 간격을 파악하여 이를 명시적인 텍스트 규칙으로 추출합니다. 3) 문맥 기반 규칙 검색은 이러한 저장된 규칙을 추론 과정에서 활성화하여 동결된 LLM이 안정적인 추론 패턴으로 향하도록 유도하고, 관찰된 오류를 회피하도록 합니다. 폐쇄형 추론 작업 및 개방형 평가 작업에 대한 광범위한 실험 결과, TF-TTCL은 온라인 평가 환경에서 강력한 제로샷 모델 및 대표적인 TTA 방법보다 일관되게 우수한 성능을 보였습니다. 코드 및 관련 정보는 https://github.com/KevinSCUTer/TF-TTCL 에서 확인할 수 있습니다.

Original Abstract

Large language models (LLMs) demonstrate strong reasoning capabilities, but their performance often degrades under distribution shift. Existing test-time adaptation (TTA) methods rely on gradient-based updates that require white-box access and need substantial overhead, while training-free alternatives are either static or depend on external guidance. In this paper, we propose Training-Free Test-Time Contrastive Learning TF-TTCL, a training-free adaptation framework that enables a frozen LLM to improve online by distilling supervision from its own inference experiences. Specifically, TF-TTCL implements a dynamic "Explore-Reflect-Steer" loop through three core modules: 1) Semantic Query Augmentation first diversifies problem views via multi-agent role-playing to generate different reasoning trajectories; 2) Contrastive Experience Distillation then captures the semantic gap between superior and inferior trajectories, distilling them into explicit textual rules; and 3) Contextual Rule Retrieval finally activates these stored rules during inference to dynamically steer the frozen LLM toward robust reasoning patterns while avoiding observed errors. Extensive experiments on closed-ended reasoning tasks and open-ended evaluation tasks demonstrate that TF-TTCL consistently outperforms strong zero-shot baselines and representative TTA methods under online evaluation. Code is available at https://github.com/KevinSCUTer/TF-TTCL.

0 Citations

0 Influential

32.897207708399 Altmetric

164.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!