2602.01745v1 Feb 02, 2026 cs.LG

확률-엔트로피 교정: 적응적 미세 조정에 대한 탄력적 지표

Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning

Hao Zhang

Citations: 1,473

h-index: 4

Minda Hu

Citations: 390

h-index: 8

Wenhao Yu

Citations: 50

h-index: 4

Shaohang Wei

Peking University

Citations: 61

h-index: 4

Jiahong Liu

Citations: 589

h-index: 11

Yifan Li

Citations: 33

h-index: 3

Aiwei Liu

Tsinghua University

Citations: 1,837

h-index: 24

Irwin King

Citations: 255

h-index: 8

토큰 수준의 재가중치는 지도 학습 미세 조정을 제어하는 간단하면서도 효과적인 방법이지만, 일반적인 지표는 대부분 일차원적입니다. 실제 확률은 하위 작업과의 정렬을 반영하는 반면, 토큰 엔트로피는 사전 학습의 영향을 받은 내재적 불확실성을 반영합니다. 엔트로피를 무시하면 잡음이 많거나 쉽게 대체될 수 있는 토큰을 학습에 중요한 토큰으로 잘못 식별할 수 있으며, 확률을 무시하면 대상별 정렬을 반영하지 못합니다. RankTuner는 확률과 엔트로피를 결합한 새로운 신호인 상대 순위 지표(Relative Rank Indicator)를 도입합니다. 이 지표는 실제 토큰의 순위를 예측 분포 하에서 예상되는 순위와 비교합니다. 이 역방향 지표를 토큰별 상대적 크기(Relative Scale)로 사용하여 미세 조정 목표를 재가중치하고, 진정으로 학습이 부족한 토큰에 업데이트를 집중하면서 내재적으로 불확실한 위치를 과도하게 벌점화하지 않습니다. 여러 모델을 사용한 실험 결과, 수학적 추론 벤치마크에서 일관된 성능 향상, 데이터 분포가 다른 추론 작업에서의 이득, 그리고 확률만 사용하거나 엔트로피만 사용하는 기존 방법보다 코드 생성 성능이 향상된 것을 확인했습니다.

Original Abstract

Token-level reweighting is a simple yet effective mechanism for controlling supervised fine-tuning, but common indicators are largely one-dimensional: the ground-truth probability reflects downstream alignment, while token entropy reflects intrinsic uncertainty induced by the pre-training prior. Ignoring entropy can misidentify noisy or easily replaceable tokens as learning-critical, while ignoring probability fails to reflect target-specific alignment. RankTuner introduces a probability--entropy calibration signal, the Relative Rank Indicator, which compares the rank of the ground-truth token with its expected rank under the prediction distribution. The inverse indicator is used as a token-wise Relative Scale to reweight the fine-tuning objective, focusing updates on truly under-learned tokens without over-penalizing intrinsically uncertain positions. Experiments on multiple backbones show consistent improvements on mathematical reasoning benchmarks, transfer gains on out-of-distribution reasoning, and pre code generation performance over probability-only or entropy-only reweighting baselines.

2 Citations

0 Influential

12 Altmetric

62.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!