Artificial Intelligence #on-policy distillation#curriculum learning
New Algorithm for Multi-Turn AI Agents Reduces Compounding Errors in Knowledge Distillation
A new algorithm called Guided On-Policy Distillation (Guided-OPD) addresses the failure mode where small student models compound errors in multi-turn tasks. By mixing teacher and student turns and using a curriculum that decays teacher intervention, the method improves average score by 21.1% and success rate by 25.5% over vanilla OPD.
Jun 16, 2026 1 source