Transformers are inherently stateless: each forward pass processes a sequence independently, with no memory of past interactions. To build AI agents that persist across sessions, researchers must find ways to inject stateful memory. A new paper on arXiv presents the Reservoir Attention Network (RAN), an architecture that adds a fixed, randomly-initialized reservoir into the mid-layer attention of a pretrained transformer to carry state across forward passes.
According to the paper by authors Leonhart and Emma, RAN is a feasibility and dynamics study. The reservoir is left untrained (fixed random) by design, isolating whether untrained recurrent dynamics alone suffice to carry usable cross-pass state. The authors treat trained recurrence as a complementary, more expensive direction.
Architecture and Experiments
RAN injects the reservoir into the mid-layer attention of the transformer. The reservoir acts as a content-addressable memory that holds state, allowing information from previous passes to influence subsequent ones. The experiments spanned multiple model sizes:
| Model | Parameter Sizes Tested |
|---|---|
| GPT-2 | 124M, 355M |
| Qwen2.5 | 0.5B, 1.5B |
All experiments were run on a single consumer GPU, demonstrating that the approach is computationally accessible. The tasks are described as minimal probes chosen to isolate individual mechanisms—not full-scale agent benchmarks. The paper states that the broader "always-alive agent" vision is treated as compute-limited future work, not a claim of this paper.
Implications
While the research is preliminary, it opens a new line of inquiry for stateful transformers without the heavy cost of training recurrent components. For enterprise technology leaders, this could eventually lead to AI systems that maintain context over longer interactions, such as supply chain optimization agents that remember past orders and disruptions without needing to re-process historical data. However, the paper does not claim any such applications; the authors explicitly limit their claims to the feasibility of the proposed mechanism.
The use of a fixed random reservoir is notable: it avoids backpropagation through time, keeping training costs low and allowing the architecture to be retrofitted into existing pretrained models. The study tested both GPT-2 and Qwen2.5, suggesting the method is model-agnostic.
Future Work
The paper identifies several directions for future research: training the reservoir (rather than leaving it fixed), scaling to larger models, and testing on more complex agent-like tasks. For now, the core contribution is demonstrating that cross-pass state can be achieved with minimal modification to existing transformers.
Enterprise technology buyers should watch this space: if the approach matures, it could enable persistent AI assistants for logistics, customs, and trade finance without requiring full retraining of large models. The paper's indication that the method runs on a single consumer GPU is a positive sign for cost-effectiveness.
"A feasibility and dynamics study of the Reservoir Attention Network (RAN), an architecture that injects a fixed, randomly-initialized reservoir into the mid-layer attention of a pretrained transformer to carry state across forward passes." — from the paper abstract
For now, the RAN remains an academic proof of concept, with the always-alive agent vision deferred to future work. But the simplicity of the injection approach—a fixed random reservoir—makes it an attractive candidate for further exploration by the AI research community.