Visit IGEN World Explore IGEN Expo

EXPLORE UPGRADE PLANS

BREAKING

Explainable deep learning improves human mental models of self-driving cars, study finds SkillsBench Benchmark Measures How Agent Skills Boost LLM Performance Across Diverse Tasks PATCH Monitor Enables Robots to Handle Unexpected Disturbances During Manipulation Tasks Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks ToolSelf AI Agents Achieve 28.8 Point Gain Through Runtime Self-Reconfiguration ArtNet: JEPA-Like Articulatory Framework Achieves 20.56% Error Reduction in Zero-Shot Phoneme Recognition LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders Explainable deep learning improves human mental models of self-driving cars, study finds SkillsBench Benchmark Measures How Agent Skills Boost LLM Performance Across Diverse Tasks PATCH Monitor Enables Robots to Handle Unexpected Disturbances During Manipulation Tasks Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks ToolSelf AI Agents Achieve 28.8 Point Gain Through Runtime Self-Reconfiguration ArtNet: JEPA-Like Articulatory Framework Achieves 20.56% Error Reduction in Zero-Shot Phoneme Recognition LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

Home ›› Topics ›› sequence models

Topic

sequence models

1 story

New AI Benchmark Reveals Brittle Reasoning in Large Language Models on Symbolic Puzzles

Artificial Intelligence #recurrent reasoning#symbolic puzzles

New AI Benchmark Reveals Brittle Reasoning in Large Language Models on Symbolic Puzzles

Researchers introduce RecurrReason, a benchmark of 10,817 symbolic puzzles to test recurrent reasoning in sequence models. The study finds that T5-style encoder-decoder models significantly outperform GPT-2-style decoder-only models on most tasks, but all models score 0% on River Crossing puzzles. Architecture is a stronger determinant of success than scale, and pre-training only helps on puzzles with locally structured transitions.

Jun 16, 2026 1 source