Artificial Intelligence #artificial intelligence#deep learning
New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors
A new research paper introduces a theory of deep transformers as mean-field interacting systems that implement distributed inference using 'function vectors' to adaptively infer latent context variables at finer scales over layers. The theory predicts a relationship between non-Gaussian hierarchical structure and transformer depth, tested with constrained linear attention models.
Jun 16, 2026 1 source