2605.29697v1 May 28, 2026 cs.AI

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

Shengxiang Gao
Shengxiang Gao
Citations: 8
h-index: 2
Jianing Yu
Jianing Yu
Citations: 41
h-index: 2
Yuchen Liu
Yuchen Liu
Citations: 22
h-index: 2
Ying Feng
Ying Feng
Citations: 13
h-index: 2
Lixiong Qin
Lixiong Qin
Citations: 347
h-index: 9
Jiasi Chen
Jiasi Chen
Citations: 278
h-index: 8
Shenggang Yang
Shenggang Yang
Citations: 0
h-index: 0
Weiran Xu
Weiran Xu
Citations: 12
h-index: 1

In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-level reward methods typically rely on costly tree sampling. We view world knowledge as a latent world graph and each IS task as search within a latent task graph, where effective steps should make graph progress toward the answer node. Based on this prior, we propose Graph-Distance Contribution Reward (GDCR), a step-level process reward that scores newly-retrieved and newly-cited entities by their distance to the answer node in a training-time Entity-Relation (ER) graph. We further propose Step Advantage Policy Optimization (SAPO), which converts GDCR into step-level advantages and combines them with trajectory-level outcome advantages. Experiments on four challenging benchmarks validate the effectiveness of our method.

0 Citations
0 Influential
4.5 Altmetric
22.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!