2606.05784v1 Jun 04, 2026 cs.AI

TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents

Guojun Yin
Guojun Yin
Citations: 265
h-index: 8
Hang He
Hang He
Citations: 61
h-index: 4
Xiaohan Wang
Xiaohan Wang
Citations: 101
h-index: 6
Jiajun Chai
Jiajun Chai
Citations: 116
h-index: 6
Chuhuai Yue
Chuhuai Yue
Citations: 50
h-index: 3
Fenghe Tang
Fenghe Tang
University of Science and Technology of China
Citations: 565
h-index: 10
S. K. Zhou
S. K. Zhou
Citations: 57
h-index: 5
Chengqi Dong
Chengqi Dong
Citations: 41
h-index: 3
Yandong Liu
Yandong Liu
Citations: 3
h-index: 1

We identify and formally characterize credit misassignment as a systematic failure mode of GRPO in tool-augmented multimodal search agents: its uniform broadcast of trajectory-level advantages to all tokens causes valuable tool-use steps in failing trajectories to be penalized no differently from valueless ones. We further empirically quantify the scale of this phenomenon. Over half of failing trajectories and failing tool-use actions exhibit correctable credit misassignment, demonstrating that the wasted training signal is both substantial and structurally exploitable. Building on this insight, we propose Tool-Aware Policy Optimization (TAPO), which exploits the parameter-determinism property of information-acquisition tools: similar call parameters define equivalent information-acquisition actions and should therefore share comparable action credit. TAPO constructs counterfactual witnesses within the current training batch and compensates misassigned negative credit via confidence-gated conservative advantage correction. It requires no additional annotation, models, or sampling, and introduces negligible computational overhead. Across multiple multimodal search benchmarks, TAPO delivers consistent, plug-and-play improvements over strong baselines for three mainstream RL algorithms (GRPO, GSPO, and SAPO). Our code and models will be publicly released upon acceptance.

0 Citations
0 Influential
5 Altmetric
25.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!