Artificial Intelligence #reinforcement learning#diversity collapse
Diversity Collapse in RLVR Explained by Overtraining in New Study
A new arXiv paper by Yuan et al. (2026) explains diversity collapse in reinforcement learning with verifiable rewards (RLVR) as a symptom of overtraining. The study shows that once a problem's contribution to the reasoning boundary saturates, further updates concentrate probability mass on successful trajectories, degrading high-k Pass@k. The authors propose Bayesian Boundary Gating (BBG) to redirect optimization and improve average Pass@k across multiple benchmarks.
Jun 17, 2026 1 source