Topic
long-context
Parallel Hybrid Architecture Combines GSS and Attention for Efficient Long-Context Language Modeling
Researchers propose the Parallel Hybrid Architecture (PHA), combining Gated State Spaces, Grouped Query Attention, and Feed-Forward Networks in parallel branches fused by a learnable mixing mechanism. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming comparable models, and scales to 180M parameters with 16.42 PPL while delivering 24% higher throughput and up to 40% lower memory usage.
MMLongEmbed Benchmark Reveals Limitations in Long-Context Multimodal Embedding Models
MMLongEmbed is the first comprehensive benchmark for evaluating multimodal embedding models (MEMs) in long-context scenarios. It comprises four retrieval tasks covering text, document, and video modalities. The evaluation reveals that current MEMs rely heavily on superficial feature matching and struggle with deep semantic and structural dependencies, with performance degrading systematically based on context length and key information placement.