2606.16215v1 Jun 15, 2026 cs.CL

PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

Yingbin Liang

Citations: 121

h-index: 5

Kejing Xia

Citations: 3

h-index: 1

Zhenbang Du

Citations: 28

h-index: 2

Xiangchi Yuan

Citations: 78

h-index: 4

Qirui Jin

Citations: 117

h-index: 4

Wenke Lee

Citations: 54

h-index: 4

Shaofeng Zou

Citations: 26

h-index: 4

Dachuan Shi

Citations: 510

h-index: 9

Jun Luo

Citations: 176

h-index: 6

Zhiwei Zheng

Citations: 6

h-index: 1

Qijia He

Citations: 15

h-index: 2

Multi-turn tool-use agents must reason, call tools, and adapt to observations across several interaction turns. Post-training such agents is challenging, as reinforcement learning often suffers from sparse rewards and weak credit assignment despite matching the prompt-only inference setting, while supervised fine-tuning on expert traces provides dense process supervision but can over-constrain the model to fixed trajectories. To tackle this, we propose PACT, a Privileged trAce Co-Training framework for multi-turn tool-use agents. The key idea is to use expert traces only as training-time optimization signals rather than rollout-time hints. PACT keeps rollout generation prompt-only, then uses expert traces to guide optimization through two complementary signals: a trace-conditioned RL surrogate that evaluates prompt-only rollouts under expert-trace context, and a component-aware SFT loss that supervises reasoning prefixes and tool-calls with annealed strength. To reduce over-reliance on the training-only trace context, PACT further introduces a prompt-only anchoring. We also provide a latent-trace view that connects the two trace-based objectives and explains how expert traces can guide optimization without being used during rollout generation. Experiments on FTRL, BFCL, and ToolHop show that PACT consistently improves over strong SFT- and RL-based baselines, highlighting the value of privileged trace co-training for multi-turn tool-use learning.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!