Visit IGEN World Explore IGEN Expo

EXPLORE UPGRADE PLANS

BREAKING

Commercial LPG Prices Cut by Over Rs 200; Delhi, Kolkata 19-kg Cylinder Rates Published US Stock Markets Rally as Chip Stock Gains Lift Nasdaq, S&P 500 and Dow SEBI Clarifies Unlisted Share Sale Rules: 200-Buyer Private Deal Limit GeM completes 10 years as India's trusted digital public procurement platform Moody's Assigns First-Time Baa2 Rating to RBL Bank, One Notch Above India's Sovereign Sebi Bars Zee's Subhash Chandra, Punit Goenka From Market for One Year Zepto Defers IPO by Two to Three Quarters After Tepid Investor Response Tim Cook: India Among Apple's Best Global Markets as June Quarter Records Revenue Domestic funds reach record 21% stake in Indian companies as FPI ownership drops to 17% Cybercriminals widen net as assessees rush to meet I-T return filing deadline Commercial LPG Prices Cut by Over Rs 200; Delhi, Kolkata 19-kg Cylinder Rates Published US Stock Markets Rally as Chip Stock Gains Lift Nasdaq, S&P 500 and Dow SEBI Clarifies Unlisted Share Sale Rules: 200-Buyer Private Deal Limit GeM completes 10 years as India's trusted digital public procurement platform Moody's Assigns First-Time Baa2 Rating to RBL Bank, One Notch Above India's Sovereign Sebi Bars Zee's Subhash Chandra, Punit Goenka From Market for One Year Zepto Defers IPO by Two to Three Quarters After Tepid Investor Response Tim Cook: India Among Apple's Best Global Markets as June Quarter Records Revenue Domestic funds reach record 21% stake in Indian companies as FPI ownership drops to 17% Cybercriminals widen net as assessees rush to meet I-T return filing deadline

Home ›› Topics ›› learnable

Topic

learnable

1 story

Parallel Hybrid Architecture Combines GSS and Attention for Efficient Long-Context Language Modeling

Artificial Intelligence #long-context#transformer

Parallel Hybrid Architecture Combines GSS and Attention for Efficient Long-Context Language Modeling

Researchers propose the Parallel Hybrid Architecture (PHA), combining Gated State Spaces, Grouped Query Attention, and Feed-Forward Networks in parallel branches fused by a learnable mixing mechanism. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming comparable models, and scales to 180M parameters with 16.42 PPL while delivering 24% higher throughput and up to 40% lower memory usage.

Jun 16, 2026 1 source