2606.11189v1 Jun 09, 2026 cs.LG

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

Yihang Chen
Yihang Chen
Citations: 3
h-index: 1
Yuanhao Ban
Yuanhao Ban
Citations: 232
h-index: 5
Yunqi Hong
Yunqi Hong
Citations: 15
h-index: 2
Sohyun An
Sohyun An
Citations: 65
h-index: 4
Tong Xie
Tong Xie
Citations: 37
h-index: 3
Cho-Jui Hsieh
Cho-Jui Hsieh
Citations: 8
h-index: 1

Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowledge prior. In this work, we reinterpret SFT as target distribution design: instead of studying only the loss objective, we analyze the token-level target that the loss drives the model to match. We introduce the Q-target framework, which decomposes SFT supervision into two explicit choices: (1) how strongly to rely on the observed token, and (2) how to allocate the remaining probability mass over alternatives. This perspective unifies many existing SFT variants as implicit choices of the target distribution Q. Building on this view, we propose Target-SFT which constructs the training objective directly from the desired target distribution. This method consistently outperforms across the ten reasoning dataset-model settings evaluated, showing the effectiveness of this target-based approach. Overall, our formulation reveals a more fundamental design principle for SFT training and opens a broader search space for SFT objectives.

0 Citations
0 Influential
2.5 Altmetric
12.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!