Artificial Intelligence #recurrent reasoning#symbolic puzzles
New AI Benchmark Reveals Brittle Reasoning in Large Language Models on Symbolic Puzzles
Researchers introduce RecurrReason, a benchmark of 10,817 symbolic puzzles to test recurrent reasoning in sequence models. The study finds that T5-style encoder-decoder models significantly outperform GPT-2-style decoder-only models on most tasks, but all models score 0% on River Crossing puzzles. Architecture is a stronger determinant of success than scale, and pre-training only helps on puzzles with locally structured transitions.
Jun 16, 2026 1 source