2606.16152v1 Jun 15, 2026 cs.AI

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

Lei Song
Lei Song
Citations: 32
h-index: 3
Jiang Bian
Jiang Bian
Citations: 173
h-index: 8
Chun Yuan
Chun Yuan
Citations: 208
h-index: 4
Lirong Che
Lirong Che
Citations: 0
h-index: 0
Haolong Qian
Haolong Qian
Citations: 26
h-index: 2
Xianliang Yang
Xianliang Yang
Citations: 95
h-index: 6
Yinuo Ma
Yinuo Ma
Citations: 17
h-index: 3
Feng Lu
Feng Lu
Citations: 13
h-index: 2
Ye Guo
Ye Guo
Citations: 28
h-index: 2

Knowledge distillation from powerful reasoning models is widely used to improve Small Language Models (SLMs) on mathematical reasoning, often assuming that traces with higher reward model scores provide more useful supervision. We identify a counterintuitive \textbf{Quality-Utility Paradox} in mathematical reasoning distillation. Data refined or synthesized by a stronger Oracle obtains higher perceived quality according to reward models, yet consistently underperforms traces generated by the SLM itself and selected through rejection sampling across Qwen2.5, LLaMA-3, and DeepSeek families. Our analysis shows that Oracle refinement couples logical repair with distributional drift away from the SLM's native reasoning distribution. This drift increases the learner's adaptation cost and can outweigh the benefit of improved reasoning logic. To test this mechanism, we introduce \textbf{Style-Aligned Refinement}, which preserves the native trajectory of the SLM while retaining logical repair from the Oracle. This intervention lowers adaptation cost and restores downstream utility. These findings suggest that effective mathematical reasoning distillation should jointly optimize perceived solution quality and learner-data compatibility, rather than relying solely on reward-model scores. The datasets and code are available at https://github.com/Dracoqhl/Quality-Utility-Paradox.

0 Citations
0 Influential
24 Altmetric
120.0 Score
Original PDF
0

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!