iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Apple CEO Tim Cook Warns of Price Hikes as Memory Chip Costs Surge India-UK free trade deal to take effect on July 15 opening 99% of exports to tariff-free access Canada’s CPP Investments Commits Rs 7,000 Crore to Hyderabad-Based CtrlS Datacenters Backlash over delivery robots: Chicago residents demand ban as councils weigh regulation C.H. Robinson sued in post-Montgomery Florida broker liability case Bank of England Expected to Hold Interest Rates at 3.75% for Fourth Consecutive Meeting FastMix: Gradient-Based Data Mixture Optimization Reduces Search Cost in AI Training New Temporal Pyramid Model Enhances Spoofed Speech Detection for Voice Security Systems InvDesMobility Framework Enables Auditable Closed-Loop Materials Discovery New Study Challenges Prior Claims on Scaling Context Length in Imitation Learning Apple CEO Tim Cook Warns of Price Hikes as Memory Chip Costs Surge India-UK free trade deal to take effect on July 15 opening 99% of exports to tariff-free access Canada’s CPP Investments Commits Rs 7,000 Crore to Hyderabad-Based CtrlS Datacenters Backlash over delivery robots: Chicago residents demand ban as councils weigh regulation C.H. Robinson sued in post-Montgomery Florida broker liability case Bank of England Expected to Hold Interest Rates at 3.75% for Fourth Consecutive Meeting FastMix: Gradient-Based Data Mixture Optimization Reduces Search Cost in AI Training New Temporal Pyramid Model Enhances Spoofed Speech Detection for Voice Security Systems InvDesMobility Framework Enables Auditable Closed-Loop Materials Discovery New Study Challenges Prior Claims on Scaling Context Length in Imitation Learning
Home ›› Technology ›› Ai ›› PACT: Privileged Trace Co-Training Boosts Multi-Turn Tool-Use Agents for Enterprise Automation

PACT: Privileged Trace Co-Training Boosts Multi-Turn Tool-Use Agents for Enterprise Automation

PACT (Privileged Trace Co-Training) addresses challenges in training multi-turn tool-use agents by using expert traces as optimization signals, not rollout hints. It combines a trace-conditioned RL surrogate and component-aware SFT loss, showing consistent gains over strong baselines on multiple benchmarks.

iG
iGEN Editorial
June 17, 2026
PACT: Privileged Trace Co-Training Boosts Multi-Turn Tool-Use Agents for Enterprise Automation

Multi-turn tool-use agents must reason, call external tools, and adapt to observations across several interaction turns. Post-training such agents is challenging: reinforcement learning (RL) often suffers from sparse rewards and weak credit assignment, while supervised fine-tuning (SFT) on expert traces provides dense process supervision but can over-constrain the model to fixed trajectories. Researchers have proposed PACT (Privileged Trace Co-Training) to tackle this problem, offering a new approach that keeps rollout generation prompt-only while using expert traces exclusively as training-time optimization signals.

The Challenge of Training Tool-Use Agents

Tool-use agents are AI systems that can invoke external APIs, databases, or software tools to complete tasks. In multi-turn settings, they must maintain context across several steps, making training difficult. According to the research paper, RL methods suffer from sparse rewards and weak credit assignment despite matching the prompt-only inference setting. SFT on expert traces provides dense process supervision but can over-constrain the model, forcing it to follow fixed trajectories rather than exploring alternative solutions.

PACT: A New Co-Training Framework

PACT introduces two complementary signals that use expert traces to guide optimization without using them during rollout generation. First, a trace-conditioned RL surrogate evaluates prompt-only rollouts under the context of expert traces. Second, a component-aware SFT loss supervises reasoning prefixes and tool-calls with annealed strength. To reduce over-reliance on the training-only trace context, PACT further incorporates a prompt-only anchoring mechanism. The researchers also provide a latent-trace view that connects the two trace-based objectives and explains how expert traces can guide optimization without being used during rollout.

Training Method Strengths Weaknesses
Reinforcement Learning (RL) Matches prompt-only inference Sparse rewards, weak credit assignment
Supervised Fine-Tuning (SFT) Dense process supervision Over-constrains to fixed trajectories
PACT (this work) Combines both signals, prompt-only rollout None reported in source

Experimental Results

The team evaluated PACT on three benchmarks: FTRL, BFCL, and ToolHop. Across all three, PACT consistently improved over strong SFT- and RL-based baselines. The paper highlights the value of privileged trace co-training for multi-turn tool-use learning, showing that expert traces can be effectively used as optimization signals without being revealed during inference.

Implications for Enterprise Automation

While the research is primarily academic, the ability to train more robust multi-turn tool-use agents has direct relevance for enterprise technology. Such agents could automate complex workflows in supply chain management, trade documentation, and logistics — where systems must reason, call multiple APIs, and adapt to changing observations. The PACT framework addresses a key limitation that has prevented wider deployment of AI agents in production: balancing exploration and adherence to expert knowledge.

The paper is authored by Du, Zhenbang; Luo, Jun; Zheng, Zhiwei; Yuan, Xiangchi; Kejing; Shi, Dachuan; Jin, Qirui; He, Qijia; Zou, Shaofeng; Liang, Yingbin; and Lee, Wenke. It is available on arXiv under the identifier 2606.16215.


Sources:

Keep Reading

Recommended Stories

SorryDB Benchmark Tests AI Provers on Real-World Lean Theorem Completion Tasks Technology

SorryDB Benchmark Tests AI Provers on Real-World Lean Theorem Completion Tasks

Researchers present SorryDB, a benchmark of open Lean tasks from 78 GitHub projects. Evaluating a snapshot of 1000 tasks, they show current approaches are complementary, with Gemini Flash-based agentic methods leading but not outperforming all others.

June 17, 2026
UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion Technology

UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion

Researchers have introduced UniSinger, the first end-to-end framework that unifies song generation and singing voice conversion with accompaniment co-generation. Built on a multimodal diffusion transformer, it enables zero-shot speaker cloning and fine-grained timbre control across tasks. Experiments demonstrate state-of-the-art performance on both tasks, offering new possibilities for intelligent music production.

June 17, 2026
Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains Technology

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

A new arXiv paper presents methods for compressing LLM-generated text, achieving over 100x reduction in data transfer compared to prior techniques. Lossless compression via domain-adapted LoRA adapters doubles efficiency, while an interactive Question-Asking protocol recovers up to 72% of the capability gap between small and large models using only 10 binary questions.

June 16, 2026
AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation Technology

AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation

Researchers propose AL-GNN, a continual graph learning framework that uses analytic learning to avoid replay buffers and backpropagation. It achieves 10% higher average performance on CoraFull, reduces forgetting by over 30% on Reddit, and cuts training time by nearly 50% while preserving data privacy.

June 16, 2026