2606.13316v1 Jun 11, 2026 cs.AI

ReSum: Synergizing LLM Reasoning and Summarization with Reinforcement Learning

Yong Wang
Yong Wang
Citations: 472
h-index: 12
Xiangxiang Chu
Xiangxiang Chu
Citations: 278
h-index: 9
Shidong Yang
Shidong Yang
Citations: 42
h-index: 2
Ziyu Ma
Ziyu Ma
Citations: 96
h-index: 4
Hailang Huang
Hailang Huang
Citations: 253
h-index: 6
Pengkun Wang
Pengkun Wang
Citations: 723
h-index: 16
Xucong Wang
Xucong Wang
Citations: 38
h-index: 2
Renda Li
Renda Li
Citations: 41
h-index: 2

Reinforcement Learning with Verifiable Rewards (RLVR) is a central technique for improving long-horizon reasoning in Large Language Models (LLMs). However, existing RLVR methods often encourage unnecessarily long reasoning rollouts, which can degrade reasoning coherence and exhaust the available context budget. Existing approaches to long-context organization often depend on external mechanisms to organize rollouts, rather than enabling the model to manage its own reasoning trajectory. To address this limitation, we propose ReSum, a novel RLVR framework that enables LLMs to compress and organize their reasoning trajectories through self-summarization. Our pilot studies show that self-summarization stabilizes generation by lowering token-level entropy, and that introducing a ``summarization'' phrase can substantially mitigate errors propagated from an incorrect rollout prefix. Motivated by these findings, ReSum adopts a summarization-aware adaptive rollout mechanism that contrastively evaluates whether self-summarization benefits the ongoing reasoning process. Specifically, when the model spontaneously triggers self-summarization, ReSum masks the summarization phrase to create a contrastive branch; for non-summarization positions, it instead randomly injects the phrase to create a matched branch. We further design a summarization-aware advantage to enable finer-grained comparison between contrastive rollout trajectories. Extensive experiments show that ReSum improves performance at an average of 4\% while reducing rollout length by 18.6\%.

0 Citations
0 Influential
8 Altmetric
40.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!