2605.28069v1 May 27, 2026 cs.AI

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Li Wang
Li Wang
Citations: 8
h-index: 2
Guojun Yin
Guojun Yin
Citations: 265
h-index: 8
Xiaohan Wang
Xiaohan Wang
Citations: 101
h-index: 6
Jiajun Chai
Jiajun Chai
Citations: 116
h-index: 6
Zhexin Hu
Zhexin Hu
Citations: 0
h-index: 0
Xiaojun Guo
Xiaojun Guo
Citations: 32
h-index: 3
Wei Lin
Wei Lin
Citations: 57
h-index: 3

Adaptive context compression is vital for scaling Large Language Models (LLMs) to complex, multi-turn agent tasks. However, rule-based compression methods may discard task-critical nuances, while Reinforcement Learning (RL) approaches usually struggle to balance information retention and token efficiency under the sparse rewards inherent to long-horizon workflows. To bridge this gap, we propose ZipRL, a novel adaptive compression framework tailored for Reinforcement Learning from Verifiable Rewards (RLVR). ZipRL features a multi-granularity compression mechanism for active, non-uniform information reduction, coupled with Hindsight Response Replay (HRR), a technique designed to densify training signals during RLVR optimization. Theoretically, we prove ZipRL's superior task-relevant utility over uniform methods. Concretely, ZipRL utilizes coarse-to-fine prompts for macro-compression and incorporates HRR into GRPO via generalized advantage reshaping. Multiple models of varying versions and parameter scales validate the effectiveness of our approach. Benchmarks on five agent tasks show ZipRL outperforms state-of-the-art approaches by 27.9% and 34.7% across Qwen3-4B and Qwen3-8B models, while maintaining exceptional token efficiency and robustness under extreme 256-turn extrapolation stress tests.

0 Citations
0 Influential
4 Altmetric
20.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!