reasoning

35 stories

Artificial Intelligence #artificial intelligence#llms

Think Again or Think Longer? Selective Verification Boosts LLM Accuracy While Cutting Compute Costs

A new preprint on arXiv proposes SEVRA, a serving-layer controller that selectively verifies LLM reasoning outputs. On MATH-500, it achieves 76.3% accuracy — higher than always verifying — while reducing post-generation tokens by 26.8% and harmful flips from 2.2% to 1.0%. The study provides a deployment rule: first tune the initial reasoning budget, then use selective recovery when explicit checks are needed.

reasoning

Think Again or Think Longer? Selective Verification Boosts LLM Accuracy While Cutting Compute Costs

Hypergraph Reasoning Framework Boosts Semantic Communication Accuracy by 36.6%

Reinforcement-Aware Knowledge Distillation Boosts LLM Reasoning Efficiency

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

FM-Agent: New Framework Automates Formal Code Verification for Large-Scale LLM-Generated Software

Independent Combinatorial Tokens Framework Boosts LLM Reasoning Performance by Up to 14.9%

New Method LUCID Detects Hallucinations in LLM-Based Knowledge Graph Reasoning

QMFOL Benchmark Reveals LLM Reasoning Degrades with Logical Complexity, New Framework Enables Precise Evaluation

MedAI Study Evaluates TxAgent's Therapeutic Reasoning in NeurIPS CURE-Bench Competition

Research Shows Code Execution Outperforms Natural Language for AI Algorithmic Reasoning

SorryDB Benchmark Tests AI Provers on Real-World Lean Theorem Completion Tasks

Multi-Sequence Verifiers Cut Inference Latency in Half for LLM Reasoning

New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress

AgenticRec: A Recommender Framework That Aligns LLM Reasoning with User Preferences

New Research Shows Chain-of-Thought Reasoning Should Be Selective, Not Default, for LLMs

Study: LLM Accuracy Declines Predictably as Reasoning Steps Increase in Clinical AI Tasks

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Tyler Framework Boosts LLM Reasoning by Up to 14 Points with Smarter Compute Allocation

Multi-Agent Peer-Reviewed Reasoning Boosts LLM Accuracy in Medical Question Answering

XMedFusion: A Knowledge-Guided Multimodal Perception and Reasoning Framework for Autonomous Medical Systems

DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy

New Hindsight Self-Distillation Method Improves LLM Reasoning by Localizing Credit at Divergence Points

New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

AdaSTORM Breakthrough Scales LLM Reasoning to Thousand-Node Dynamic Graphs, Paves Way for Supply Chain AI

VibeThinker-3B: Small Language Model Matches Giants in Verifiable Reasoning, According to arXiv Paper

Limited Marginal Benefit of Reasoning-Heavy LLMs in ESG Scoring: Study on Japanese Firms

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

New Self-Enhanced Fine-Tuning Method Boosts Text-to-SQL Reasoning and Generalization

New Benchmark IRTS-ToolBench Tests LLMs on Irregular Time Series Question Answering

Latent Thought Flow: Efficient Reasoning in LLMs Cuts Cost and Boosts Accuracy

Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8%

CycliST Benchmark Reveals Video Language Models Struggle with Cyclical State Transitions

PrologMCP: A Standardized Prolog Tool Interface That Boosts LLM Agents’ Deductive Accuracy

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

Semi-Supervised Framework Scales LLM Reasoning Using 10-15x Fewer Labels Than Traditional Methods