Artificial Intelligence #long-context#transformer
Parallel Hybrid Architecture Combines GSS and Attention for Efficient Long-Context Language Modeling
Researchers propose the Parallel Hybrid Architecture (PHA), combining Gated State Spaces, Grouped Query Attention, and Feed-Forward Networks in parallel branches fused by a learnable mixing mechanism. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming comparable models, and scales to 180M parameters with 16.42 PPL while delivering 24% higher throughput and up to 40% lower memory usage.
Jun 16, 2026 1 source