2606.16215v1 Jun 15, 2026 cs.CL

PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

Yingbin Liang
Yingbin Liang
Citations: 121
h-index: 5
Kejing Xia
Kejing Xia
Citations: 3
h-index: 1
Zhenbang Du
Zhenbang Du
Citations: 28
h-index: 2
Xiangchi Yuan
Xiangchi Yuan
Citations: 78
h-index: 4
Qirui Jin
Qirui Jin
Citations: 117
h-index: 4
Wenke Lee
Wenke Lee
Citations: 54
h-index: 4
Shaofeng Zou
Shaofeng Zou
Citations: 26
h-index: 4
Dachuan Shi
Dachuan Shi
Citations: 510
h-index: 9
Jun Luo
Jun Luo
Citations: 176
h-index: 6
Zhiwei Zheng
Zhiwei Zheng
Citations: 6
h-index: 1
Qijia He
Qijia He
Citations: 15
h-index: 2

Multi-turn tool-use agents must reason, call tools, and adapt to observations across several interaction turns. Post-training such agents is challenging, as reinforcement learning often suffers from sparse rewards and weak credit assignment despite matching the prompt-only inference setting, while supervised fine-tuning on expert traces provides dense process supervision but can over-constrain the model to fixed trajectories. To tackle this, we propose PACT, a Privileged trAce Co-Training framework for multi-turn tool-use agents. The key idea is to use expert traces only as training-time optimization signals rather than rollout-time hints. PACT keeps rollout generation prompt-only, then uses expert traces to guide optimization through two complementary signals: a trace-conditioned RL surrogate that evaluates prompt-only rollouts under expert-trace context, and a component-aware SFT loss that supervises reasoning prefixes and tool-calls with annealed strength. To reduce over-reliance on the training-only trace context, PACT further introduces a prompt-only anchoring. We also provide a latent-trace view that connects the two trace-based objectives and explains how expert traces can guide optimization without being used during rollout generation. Experiments on FTRL, BFCL, and ToolHop show that PACT consistently improves over strong SFT- and RL-based baselines, highlighting the value of privileged trace co-training for multi-turn tool-use learning.

0 Citations
0 Influential
4.5 Altmetric
22.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!