2605.29568v1 May 28, 2026 cs.AI

DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning

Xiao Ding
Xiao Ding
Citations: 1,264
h-index: 17
Bibo Cai
Bibo Cai
Citations: 94
h-index: 6
Kai Xiong
Kai Xiong
Research Center for Social Computing and Information Retrieval
Citations: 406
h-index: 9
Zhouhao Sun
Zhouhao Sun
Citations: 87
h-index: 5
Bing Qin
Bing Qin
Citations: 411
h-index: 12
Ting Liu
Ting Liu
Citations: 1,454
h-index: 15
Yangfan He
Yangfan He
Citations: 149
h-index: 6
Yufei Zhang
Yufei Zhang
Citations: 35
h-index: 3

Tool-Integrated Reasoning (TIR) extends LLM capabilities by leveraging external environments. However, existing methods lack the deliberation during sequential tool invocation required for strategic planning and self-correction. While RL mitigates this, conventional approaches for Tool-Integrated Reasoning are hindered by sparse outcome-based rewards, failing to supervise intermediate reasoning steps and tool invocations. To address this, we propose DeepTool, a novel framework that scales deliberate thinking within the interleaved process of thinking, action, and observation at each turn. In DeepTool, we first introduce a synthesis pipeline that evolves extended thinking into interleaved trajectories, integrating adversarial perturbations to ensure robustness and self-correction. Secondly, we devise Process-Supervised Reinforcement Learning based on GRPO, which utilizes an Action-Centric Process Reward to reinforce intermediate interleaved thinking and enforce precise tool invocation at every turn. Extensive experiments demonstrate that DeepTool achieves superior performance, boosting Qwen2.5-7B significantly across six benchmarks (e.g., AIME24: 3.2% -> 40.4% and HMMT25: 0.0% -> 28.6%). Furthermore, the token cost-effectiveness analysis confirms the utility of interleaved thinking, demonstrating DeepTool's optimal balance between performance and token efficiency.

0 Citations
0 Influential
8.5 Altmetric
42.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!