2605.29697v1 May 28, 2026 cs.AI

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

Shengxiang Gao

Citations: 8

h-index: 2

Jianing Yu

Citations: 41

h-index: 2

Yuchen Liu

Citations: 22

h-index: 2

Ying Feng

Citations: 13

h-index: 2

Lixiong Qin

Citations: 347

h-index: 9

Jiasi Chen

Citations: 278

h-index: 8

Shenggang Yang

Citations: 0

h-index: 0

Weiran Xu

Citations: 12

h-index: 1

In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-level reward methods typically rely on costly tree sampling. We view world knowledge as a latent world graph and each IS task as search within a latent task graph, where effective steps should make graph progress toward the answer node. Based on this prior, we propose Graph-Distance Contribution Reward (GDCR), a step-level process reward that scores newly-retrieved and newly-cited entities by their distance to the answer node in a training-time Entity-Relation (ER) graph. We further propose Step Advantage Policy Optimization (SAPO), which converts GDCR into step-level advantages and combines them with trajectory-level outcome advantages. Experiments on four challenging benchmarks validate the effectiveness of our method.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!