2605.26958v1 May 26, 2026 cs.CL

Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation

Yan Gao
Yan Gao
Citations: 44
h-index: 4
Yiqun T. Chen
Yiqun T. Chen
Citations: 204
h-index: 7
Erhan Zhang
Erhan Zhang
Citations: 91
h-index: 4
Jiaxin Mao
Jiaxin Mao
Citations: 159
h-index: 7
Xiaochi Wei
Xiaochi Wei
Citations: 11
h-index: 2
Yi Wu
Yi Wu
Citations: 28
h-index: 4
Yao Hu
Yao Hu
Citations: 16
h-index: 2
Wei Yang
Wei Yang
Citations: 6
h-index: 1
Zixuan Yang
Zixuan Yang
Citations: 18
h-index: 3
Zihang Shen
Zihang Shen
Citations: 0
h-index: 0

Reinforcement learning in open-ended long-form generation is challenging because reliable reference answers and automatic metrics are often unavailable. Existing rubric-based methods typically rely on pointwise LLM-as-a-judge scoring, but absolute scores are difficult to calibrate across complex responses, may provide weak discrimination among same-query rollouts, and can become saturated during optimization. We propose Tournament-GRPO, a group-wise reward framework that converts rubric-guided LLM judgments into relative rewards through repeated multi-round tournaments among same-query rollouts. Tournament-GRPO compares candidates within groups, accumulates tournament outcomes, and normalizes them into group-wise rewards for GRPO training. Experiments on Deep Research Bench show that Tournament-GRPO consistently outperforms existing reward-design baselines, achieving a 4.52-point overall-score improvement over the strongest baseline. Further analyses show that tournament rewards provide a favorable effectiveness--efficiency trade-off and that tournament design affects training dynamics. These results suggest that rubric-guided tournament comparison provides an effective reward signal for reinforcement learning in open-ended long-form generation.

0 Citations
0 Influential
3.5 Altmetric
17.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!