iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
New Framework Reduces Visual Hallucinations in Multimodal AI Systems Without Retraining MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment AI Framework Targets 50% Water Loss in Jordan with LLM and Digital Twin Integration AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes MoFore: A New Self-Supervised Framework Learns Video Representations by Forecasting Future Latent Embeddings Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse? A New Study Evaluates Four Models AI Video Generation Method for Cardiac MRI Addresses Data Scarcity with Latent Motion Modeling SCAN Framework Helps CTOs Decide When to Use Generative AI for Task Allocation LLM-Encoded Knowledge Guides Federated Graph Recommendation to Improve Accuracy New Framework Reduces Visual Hallucinations in Multimodal AI Systems Without Retraining MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment AI Framework Targets 50% Water Loss in Jordan with LLM and Digital Twin Integration AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes MoFore: A New Self-Supervised Framework Learns Video Representations by Forecasting Future Latent Embeddings Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse? A New Study Evaluates Four Models AI Video Generation Method for Cardiac MRI Addresses Data Scarcity with Latent Motion Modeling SCAN Framework Helps CTOs Decide When to Use Generative AI for Task Allocation LLM-Encoded Knowledge Guides Federated Graph Recommendation to Improve Accuracy
Home ›› Topics ›› reasoning

Topic

reasoning

10 stories
Limited Marginal Benefit of Reasoning-Heavy LLMs in ESG Scoring: Study on Japanese Firms Technology
Artificial Intelligence #llm#esg

Limited Marginal Benefit of Reasoning-Heavy LLMs in ESG Scoring: Study on Japanese Firms

A 4-model consensus study on 10 Japanese listed firms found that reasoning-heavy LLMs add little value over cheaper alternatives in ESG narrative scoring, with a mean absolute deviation of only 0.38 on a 5-point scale and 5.6x higher cost.

Jun 16, 2026 1 source
AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models Technology
Artificial Intelligence #artificial intelligence#multilingual

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

AdaMame, a two-stage training recipe for multilingual mathematical reasoning, addresses language collapse in large reasoning models. It adaptively aligns reasoning language to the query language without compromising accuracy, achieving Pareto-optimal performance across 12 languages.

Jun 16, 2026 1 source
New Self-Enhanced Fine-Tuning Method Boosts Text-to-SQL Reasoning and Generalization Technology
Artificial Intelligence #text-to-sql#reasoning

New Self-Enhanced Fine-Tuning Method Boosts Text-to-SQL Reasoning and Generalization

Researchers propose CoTE-SQL, a self-enhanced fine-tuning method that improves text-to-SQL generation by integrating reasoning traces, structured chain-of-thought prompting, and execution error correction. The approach achieves state-of-the-art results on Bird and Spider benchmarks, particularly on complex queries.

Jun 16, 2026 1 source
New Benchmark IRTS-ToolBench Tests LLMs on Irregular Time Series Question Answering Technology
Artificial Intelligence #ai#artificial intelligence

New Benchmark IRTS-ToolBench Tests LLMs on Irregular Time Series Question Answering

A research paper introduces IRTS-ToolBench, a benchmark of 1,700 questions spanning 10 task types across 13 domains to evaluate large language models (LLMs) and AI agents on irregular time series question answering (TSQA). The benchmark addresses a gap in existing TSQA benchmarks that assume regular sampling, providing standardized inputs and a reproducible evaluation protocol for verifiable agentic data science.

Jun 16, 2026 2 sources
Latent Thought Flow: Efficient Reasoning in LLMs Cuts Cost and Boosts Accuracy Technology
Artificial Intelligence #large language models#ai

Latent Thought Flow: Efficient Reasoning in LLMs Cuts Cost and Boosts Accuracy

Researchers propose Latent Thought Flow (LTF), a method that models LLM reasoning as continuous trajectories in latent space, using GFlowNet and entropy-weighted objectives. LTF outperforms explicit Chain-of-Thought and latent reasoning baselines, achieving 9.5% higher accuracy while cutting reasoning length by 27.2%, addressing the linguistic bottleneck that inflates inference costs.

Jun 16, 2026 1 source
Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8% Technology
Artificial Intelligence #artificial intelligence#large language models

Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8%

A new research paper proposes Think-at-Hard (TaH), a looped transformer that selectively performs latent iterations only on tokens likely to be incorrect. By skipping iterations on 93% of tokens, TaH outperforms always-iterate models by 3.8-4.4% and single-iteration baselines by up to 6.8%, while requiring negligible extra parameters.

Jun 16, 2026 1 source
CycliST Benchmark Reveals Video Language Models Struggle with Cyclical State Transitions Technology
Artificial Intelligence #cyclist#video language model

CycliST Benchmark Reveals Video Language Models Struggle with Cyclical State Transitions

The CycliST benchmark, introduced by a team of researchers, evaluates Video Language Models on cyclical state transitions. Results show current VLMs struggle to detect and reason about periodic patterns, with no single model performing consistently across all tasks.

Jun 16, 2026 1 source
PrologMCP: A Standardized Prolog Tool Interface That Boosts LLM Agents’ Deductive Accuracy Technology
Artificial Intelligence #prolog#llm

PrologMCP: A Standardized Prolog Tool Interface That Boosts LLM Agents’ Deductive Accuracy

A team of researchers introduced PrologMCP, an open-source server that exposes Prolog as a stateful tool through the Model Context Protocol, allowing LLM agents to delegate deductive reasoning tasks. In evaluations on the PARARULE-Plus benchmark, an agent powered by PrologMCP achieved accuracy of 1.00 on a general sample, matching or exceeding reasoning LLMs, and 1.00/0.99 on a challenging subset where reasoning models dropped to 0.95/0.94.

Jun 16, 2026 2 sources
The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning Technology
Artificial Intelligence #ai#machine learning

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

A research paper identifies a 'Quality-Utility Paradox' in mathematical reasoning distillation: data refined by stronger models (Oracle) receives high reward scores but impairs small model performance compared to using the model's own self-generated traces. The authors propose Style-Aligned Refinement to preserve native reasoning patterns while incorporating logical corrections.

Jun 16, 2026 1 source
Semi-Supervised Framework Scales LLM Reasoning Using 10-15x Fewer Labels Than Traditional Methods Technology
Artificial Intelligence #llm#reasoning

Semi-Supervised Framework Scales LLM Reasoning Using 10-15x Fewer Labels Than Traditional Methods

A new semi-supervised framework for training LLM reasoning uses a lightweight verifier to judge reasoning quality, requiring only a few labeled samples. Experiments on math problems and visual question answering show accuracy comparable to 10-15x more labeled data. The method could reduce the cost of building large-scale reasoning datasets.

Jun 16, 2026 2 sources