2606.16152v1 Jun 15, 2026 cs.AI

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

Lei Song

Citations: 32

h-index: 3

Jiang Bian

Citations: 173

h-index: 8

Chun Yuan

Citations: 208

h-index: 4

Lirong Che

Citations: 0

h-index: 0

Haolong Qian

Citations: 26

h-index: 2

Xianliang Yang

Citations: 95

h-index: 6

Yinuo Ma

Citations: 17

h-index: 3

Feng Lu

Citations: 13

h-index: 2

Ye Guo

Citations: 28

h-index: 2

Knowledge distillation from powerful reasoning models is widely used to improve Small Language Models (SLMs) on mathematical reasoning, often assuming that traces with higher reward model scores provide more useful supervision. We identify a counterintuitive \textbf{Quality-Utility Paradox} in mathematical reasoning distillation. Data refined or synthesized by a stronger Oracle obtains higher perceived quality according to reward models, yet consistently underperforms traces generated by the SLM itself and selected through rejection sampling across Qwen2.5, LLaMA-3, and DeepSeek families. Our analysis shows that Oracle refinement couples logical repair with distributional drift away from the SLM's native reasoning distribution. This drift increases the learner's adaptation cost and can outweigh the benefit of improved reasoning logic. To test this mechanism, we introduce \textbf{Style-Aligned Refinement}, which preserves the native trajectory of the SLM while retaining logical repair from the Oracle. This intervention lowers adaptation cost and restores downstream utility. These findings suggest that effective mathematical reasoning distillation should jointly optimize perceived solution quality and learner-data compatibility, rather than relying solely on reward-model scores. The datasets and code are available at https://github.com/Dracoqhl/Quality-Utility-Paradox.

0 Citations

0 Influential

24 Altmetric

120.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!