2605.29491v1 May 28, 2026 cs.AI

The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF

Wentao Zhang
Wentao Zhang
Citations: 20
h-index: 3
Tianle Chen
Tianle Chen
Citations: 112
h-index: 3
Xiaolu Zhang
Xiaolu Zhang
Citations: 1,075
h-index: 5
Zhankai Xu
Zhankai Xu
Citations: 42
h-index: 3
Zeli Su
Zeli Su
Citations: 7
h-index: 1
Longfei Zheng
Longfei Zheng
Ant Financial Services Group
Citations: 466
h-index: 11
Jun Zhou
Jun Zhou
Citations: 1,106
h-index: 5

Large Language Models (LLMs) are increasingly deployed in agentic and retrieval-augmented generation (RAG) systems, where they must execute user-specified tasks over externally provided reference text. In practice, such context is often unstructured and contaminated with benign but instruction-like semantic noise, such as editorial comments and system traces, which should be treated strictly as data. We introduce DistractionIF, a benchmark designed to evaluate robustness against such distractor instructions in reference text. Across a broad range of models, we observe a consistent inverse scaling phenomenon: larger models are often less robust, with performance dropping by up to 30 points as scale increases. Mechanistically, our perplexity analysis reveals that scaling erodes the probabilistic boundary between robust and distracted behaviors, making models increasingly prone to over-interpreting noise as instructions. To address this, we demonstrate that reinforcement learning, specifically Group Relative Policy Optimization (GRPO), can restore this boundary, improving robustness by up to 15.5% without compromising general instruction-following capability. Our findings highlight a critical instruction-following robustness gap in reference-grounded tasks and establish reinforcement learning as a promising path for enforcing strict data-instruction separation at scale.

0 Citations
0 Influential
5.5 Altmetric
27.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!