iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
NordVPN's Private Server Add-On Gives Enterprises Isolated Hardware and Static IP for Secure Remote Access India Soyabean Acreage Seen Rising Up to 10% on High Prices, Weak Monsoon Outlook FlowMPC: New Framework Combines Flow Matching and World Models to Improve Robot Manipulation DYNA Framework Uses Temporal Knowledge Graphs to Reduce LLM Forgetting Without Retraining RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents Infant-Inspired Noise Boosts Deep RL Exploration, Research from arXiv Shows Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Apple explains why Siri AI took so long: first version ready last year but rebuilt from ground up NordVPN's Private Server Add-On Gives Enterprises Isolated Hardware and Static IP for Secure Remote Access India Soyabean Acreage Seen Rising Up to 10% on High Prices, Weak Monsoon Outlook FlowMPC: New Framework Combines Flow Matching and World Models to Improve Robot Manipulation DYNA Framework Uses Temporal Knowledge Graphs to Reduce LLM Forgetting Without Retraining RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents Infant-Inspired Noise Boosts Deep RL Exploration, Research from arXiv Shows Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Apple explains why Siri AI took so long: first version ready last year but rebuilt from ground up
Home ›› Topics ›› transformer

Topic

transformer

4 stories
Parallel Hybrid Architecture Combines GSS and Attention for Efficient Long-Context Language Modeling Technology
Artificial Intelligence #long-context#transformer

Parallel Hybrid Architecture Combines GSS and Attention for Efficient Long-Context Language Modeling

Researchers propose the Parallel Hybrid Architecture (PHA), combining Gated State Spaces, Grouped Query Attention, and Feed-Forward Networks in parallel branches fused by a learnable mixing mechanism. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming comparable models, and scales to 180M parameters with 16.42 PPL while delivering 24% higher throughput and up to 40% lower memory usage.

Jun 16, 2026 1 source
PolyKV: Layer-Wise KV Cache Compression Boosts LLM Inference Efficiency by Up to 54.5% Technology
Artificial Intelligence #kv cache#compression

PolyKV: Layer-Wise KV Cache Compression Boosts LLM Inference Efficiency by Up to 54.5%

PolyKV is a new framework for compressing the key-value cache in large language model inference. It selects a compression policy per transformer layer and allocates non-uniform cache budgets, outperforming uniform approaches. On LongBench tasks, PolyKV recovers 40%-54.5% of the performance gap between the strongest single-policy baseline and full KV cache.

Jun 16, 2026 1 source
VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference Technology
Artificial Intelligence #video anomaly detection#deformable attention

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

A new AI framework, VigilFormer, uses deformable attention and causal inference to detect anomalies in surveillance video at 41.5 FPS, outperforming prior methods on three benchmarks.

Jun 16, 2026 1 source
NVIDIA Open-Sources Nemotron 3 Ultra: 550B-Parameter Hybrid Mamba-Transformer Model for Agentic AI Technology
Artificial Intelligence #ai#machine learning

NVIDIA Open-Sources Nemotron 3 Ultra: 550B-Parameter Hybrid Mamba-Transformer Model for Agentic AI

NVIDIA introduced Nemotron 3 Ultra, a 550 billion total parameter Mixture-of-Experts language model with a hybrid Mamba-Attention architecture. Only 55 billion parameters are active per inference. Pre-trained on 20 trillion tokens and supporting a 1 million token context length, the model achieves up to 6x higher inference throughput versus state-of-the-art public LLMs while matching accuracy. All checkpoints, training data, and recipes are open-sourced on HuggingFace.

Jun 16, 2026 2 sources