2605.26403v1 May 26, 2026 cs.AI

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

Muzhao Tian
Muzhao Tian
Citations: 44
h-index: 5
Zisu Huang
Zisu Huang
Citations: 71
h-index: 6
Xiaohua Wang
Xiaohua Wang
Citations: 425
h-index: 9
Changze Lv
Changze Lv
Fudan Univerisity
Citations: 649
h-index: 11
Xiaoqing Zheng
Xiaoqing Zheng
Citations: 624
h-index: 10
Jiakang Yuan
Jiakang Yuan
Fudan University
Citations: 570
h-index: 14
Kaitao Song
Kaitao Song
Citations: 23
h-index: 3
Tao Chen
Tao Chen
Citations: 38
h-index: 3

A long-standing goal of the research community is to develop highly interactive LLM-based dialogue agents. Recent research focuses on optimizing policies based on fixed offline logs (Static Context RL) or using a prompt-based simulator (Interactive RL). In this work, we theoretically show that both paradigms are fundamentally limited by context distribution shift--a mismatch between dialogue histories observed during training and those encountered in real conversations. This shift compounds quadratically over turns and severely degrades dialogue quality. Specifically, we attribute this shift to two distinct sources: (i) policy-induced shift, arising from training on static histories rather than self-generated trajectories; and (ii) simulator-induced shift, stemming from discrepancies between simulated and real human behaviors. To address these challenges, we propose Calibrated Interactive RL, a unified framework that couples interactive RL with simulator alignment. By aligning the simulator with human interaction patterns, our approach reduces the sim-to-real gap and mitigates compounding distribution shifts. Experiments across multiple dialogue tasks confirm our theoretical analysis: (i) Interactive RL significantly outperforms the Static Context baseline by mitigating policy distribution shift; and (ii) calibrating simulators with our alignment method further bridges the sim-to-real gap, yielding state-of-the-art downstream performance.

0 Citations
0 Influential
7 Altmetric
35.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!