2601.02151v1 Jan 05, 2026 cs.LG

엔트로피 적응 미세 조정: 확신을 가진 충돌 해결을 통한 망각 완화

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Wuxuan Gong

Citations: 21

h-index: 3

Muxi Diao

Citations: 428

h-index: 7

Yutong Zhang

Citations: 227

h-index: 5

Lele Yang

Citations: 12

h-index: 3

Zhonghao Yan

Citations: 60

h-index: 4

Yufei Han

Citations: 32

h-index: 3

Kongming Liang

Citations: 6

h-index: 1

Weiran Xu

Citations: 43

h-index: 4

Zhanyu Ma

Citations: 120

h-index: 6

지도 학습 기반 미세 조정(SFT)은 도메인 적응의 표준적인 방법이지만, 종종 파괴적인 망각 문제를 야기합니다. 반면, 온-정책 강화 학습(RL)은 일반적인 능력을 효과적으로 유지합니다. 우리는 이러한 차이를 조사하고, 근본적인 분포 차이를 확인했습니다. RL은 모델의 내부적인 믿음에 부합하는 반면, SFT는 모델을 외부적인 지도에 맞추도록 강제합니다. 이러한 불일치는 종종 낮은 확률과 낮은 엔트로피를 갖는 '확신을 가진 충돌' 토큰으로 나타납니다. 이러한 경우, 모델은 자신의 예측에 매우 확신하지만, 일치하지 않는 정답을 학습해야 하므로 파괴적인 경사 업데이트를 유발합니다. 이를 해결하기 위해, 우리는 엔트로피 적응 미세 조정(EAFT)을 제안합니다. EAFT는 예측 확률에만 의존하는 기존 방법과 달리, 토큰 레벨의 엔트로피를 게이팅 메커니즘으로 사용하여 인식적 불확실성과 지식 충돌을 구별합니다. 이를 통해 모델은 불확실한 데이터로부터 학습하고, 동시에 충돌하는 데이터에 대한 경사를 억제할 수 있습니다. 4B에서 32B 파라미터 범위의 Qwen 및 GLM 시리즈 모델을 대상으로 한 광범위한 실험 결과, 수학, 의료, 에이전트 관련 분야에서 EAFT가 기존 SFT와 동일한 성능을 유지하면서 일반적인 능력의 저하를 크게 완화한다는 것을 확인했습니다.

Original Abstract

Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities. We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as "Confident Conflicts" tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates. To address this, we propose Entropy-Adaptive Fine-Tuning (EAFT). Unlike methods relying solely on prediction probability, EAFT utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. This allows the model to learn from uncertain samples while suppressing gradients on conflicting data. Extensive experiments on Qwen and GLM series (ranging from 4B to 32B parameters) across mathematical, medical, and agentic domains confirm our hypothesis. EAFT consistently matches the downstream performance of standard SFT while significantly mitigating the degradation of general capabilities.

4 Citations

2 Influential

3.5 Altmetric

25.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!