2606.05633v1 Jun 04, 2026 cs.AI

Answer Presence Drives RAG Rewriting Gains

Yuejie Li
Yuejie Li
Citations: 5
h-index: 1
Ke Yang
Ke Yang
Citations: 420
h-index: 8
Yue-Yang He
Yue-Yang He
Citations: 7
h-index: 1
Bolin Chen
Bolin Chen
Citations: 25
h-index: 1
Bowen Li
Bowen Li
Citations: 42
h-index: 3
Chengjun Mao
Chengjun Mao
Citations: 22
h-index: 2
Yueying Hua
Yueying Hua
Citations: 40
h-index: 2
Li Zhang
Li Zhang
Citations: 87
h-index: 5
Ruiqiang Li
Ruiqiang Li
Citations: 1
h-index: 1
Taotao Wang
Taotao Wang
Citations: 8
h-index: 1

Retrieval-augmented QA pipelines often route retrieved passages through an LLM \emph{rewriter} before a smaller reader, lifting F1 by tens of points on multi-hop benchmarks; this gain is typically credited to improved evidence quality. We ask whether that lift is causally driven by the gold answer string appearing in the rewritten context rather than by curation per se, using a controlled intervention audit. For each rewritten context we re-run the reader after one of four controlled edits to the compile output: removing the gold answer span, replacing a length-matched random non-answer span (placebo), or injecting the gold into rewrites where it was absent (at the prefix or at a midpoint sentence boundary). Across twelve completed (cell, baseline) intervention runs spanning three reader families (Qwen2.5-7B, Qwen3.5-35B, GLM-4.7), two datasets (HotpotQA, 2WikiMultihopQA), and three compiler arrangements (MA-only, MB-only, MA$+$verify), removing the gold answer drops reader F1 by $28$ to $64$ points beyond the length-matched placebo on paired \texttt{answer-in-compile} strata, and prepending the gold into rewrites that lacked it raises F1 by $+0.7$ to $+9.7$ points in $10$ of $12$ (cell, baseline) combinations. A companion five-sentinel audit shows the conventional single-\texttt{[MASK]} probe is itself sentinel-fragile: on 2Wiki it reports a $+4.12$~F1 ``non-leakage residual'' that flips to $-3.33$ to $-7.81$~F1 under four alternative sentinels and fails an equivalence test for three of those four ($1/4$~pass). We do not propose a new rewriter or mitigation; we release the intervention runner and the sentinel panel so that other rewriter-gain claims can be tested against the same standard.

0 Citations
0 Influential
4 Altmetric
20.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!