PACT: Privileged Trace Co-Training Boosts Multi-Turn Tool-Use Agents for Enterprise Automation

PACT (Privileged Trace Co-Training) addresses challenges in training multi-turn tool-use agents by using expert traces as optimization signals, not rollout hints. It combines a trace-conditioned RL surrogate and component-aware SFT loss, showing consistent gains over strong baselines on multiple benchmarks.

iGEN Editorial

June 17, 2026

PACT: Privileged Trace Co-Training Boosts Multi-Turn Tool-Use Agents for Enterprise Automation

Multi-turn tool-use agents must reason, call external tools, and adapt to observations across several interaction turns. Post-training such agents is challenging: reinforcement learning (RL) often suffers from sparse rewards and weak credit assignment, while supervised fine-tuning (SFT) on expert traces provides dense process supervision but can over-constrain the model to fixed trajectories. Researchers have proposed PACT (Privileged Trace Co-Training) to tackle this problem, offering a new approach that keeps rollout generation prompt-only while using expert traces exclusively as training-time optimization signals.

The Challenge of Training Tool-Use Agents

Tool-use agents are AI systems that can invoke external APIs, databases, or software tools to complete tasks. In multi-turn settings, they must maintain context across several steps, making training difficult. According to the research paper, RL methods suffer from sparse rewards and weak credit assignment despite matching the prompt-only inference setting. SFT on expert traces provides dense process supervision but can over-constrain the model, forcing it to follow fixed trajectories rather than exploring alternative solutions.

PACT: A New Co-Training Framework

PACT introduces two complementary signals that use expert traces to guide optimization without using them during rollout generation. First, a trace-conditioned RL surrogate evaluates prompt-only rollouts under the context of expert traces. Second, a component-aware SFT loss supervises reasoning prefixes and tool-calls with annealed strength. To reduce over-reliance on the training-only trace context, PACT further incorporates a prompt-only anchoring mechanism. The researchers also provide a latent-trace view that connects the two trace-based objectives and explains how expert traces can guide optimization without being used during rollout.

Training Method	Strengths	Weaknesses
Reinforcement Learning (RL)	Matches prompt-only inference	Sparse rewards, weak credit assignment
Supervised Fine-Tuning (SFT)	Dense process supervision	Over-constrains to fixed trajectories
PACT (this work)	Combines both signals, prompt-only rollout	None reported in source

Experimental Results

The team evaluated PACT on three benchmarks: FTRL, BFCL, and ToolHop. Across all three, PACT consistently improved over strong SFT- and RL-based baselines. The paper highlights the value of privileged trace co-training for multi-turn tool-use learning, showing that expert traces can be effectively used as optimization signals without being revealed during inference.

Implications for Enterprise Automation

While the research is primarily academic, the ability to train more robust multi-turn tool-use agents has direct relevance for enterprise technology. Such agents could automate complex workflows in supply chain management, trade documentation, and logistics — where systems must reason, call multiple APIs, and adapt to changing observations. The PACT framework addresses a key limitation that has prevented wider deployment of AI agents in production: balancing exploration and adherence to expert knowledge.

The paper is authored by Du, Zhenbang; Luo, Jun; Zheng, Zhiwei; Yuan, Xiangchi; Kejing; Shi, Dachuan; Jin, Qirui; He, Qijia; Zou, Shaofeng; Liang, Yingbin; and Lee, Wenke. It is available on arXiv under the identifier 2606.16215.

Sources:

PACT: Privileged Trace Co-Training Boosts Multi-Turn Tool-Use Agents for Enterprise Automation

The Challenge of Training Tool-Use Agents

PACT: A New Co-Training Framework

Experimental Results

Implications for Enterprise Automation

Recommended Stories

SorryDB Benchmark Tests AI Provers on Real-World Lean Theorem Completion Tasks

UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation