2606.10279v1 Jun 09, 2026 cs.AI

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

Cheng Qian
Cheng Qian
Citations: 243
h-index: 9
Yiwei Wang
Yiwei Wang
Citations: 490
h-index: 13
Buxin Su
Buxin Su
University of Pennsylvania
Citations: 45
h-index: 4
Bingxin Zhao
Bingxin Zhao
Citations: 7
h-index: 1
Bingxuan Li
Bingxuan Li
Citations: 260
h-index: 4
Jinran Jin
Jinran Jin
Citations: 7
h-index: 1

Supervised fine-tuning with synthetic rationale data is widely assumed to improve language model performance on clinical prediction tasks by teaching models not just what to predict but why. We test this assumption on five-year Alzheimer's disease and related dementias (ADRD) prediction from longitudinal health histories. Across a large-scale controlled experiment of 504 configurations, we find that rationale-based SFT consistently and substantially hurts prediction performance relative to label-only fine-tuning. The degradation persists across model families and data scales, and is not resolved by using a reasoning-oriented base model. Crucially, the failure is not explained by poor rationale quality: human expert annotation confirms that the generated rationales are medically accurate and faithfully grounded in patient-specific evidence, and few-shot experiments show that the same rationales improve performance when used as inference-time demonstrations rather than training targets. We identify the root cause as a structural conflict between narrative plausibility and discriminative optimization. We hope our work paves the path toward a more precise understanding of when and how rationale-based supervision helps and when it does not, guiding the responsible development of language models for high-stakes clinical prediction.

0 Citations
0 Influential
6.5 Altmetric
32.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!