Artificial Intelligence #artificial intelligence#residual networks
Norm-Agnostic Residual Networks Offer Path to Scaling Adaptive Depth in Deep Learning
Researchers introduce NAG, a norm-agnostic residual architecture that prevents later layers from being suppressed by norm growth. This enables training of much deeper models and introduces an interpretable Mixture-of-Depths mechanism that can serve as a pretraining scaling strategy, with 20-25% sparsity matching full-depth baseline under equal compute.
Jun 17, 2026 1 source