Artificial Intelligence #gradient descent#recurrent networks
New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks
Researchers introduce the Gradient-based Recurrent In-context Learner (GRIL), a linear recurrent network architecture with windowed cross-product self-attention that can implement minibatch gradient descent on a task-specific predictor in a single forward pass. The design achieves strong performance on synthetic in-context learning tasks, Long Range Arena, and language modeling.
Jun 16, 2026 1 source