iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
GAS-Leak-LLM: Genetic Algorithm Jailbreaks Black-Box LLMs, Exposing Safety Gaps New Generative Recommendation Model HoloRec Uses Hierarchical Encoding and Interleaved Reasoning to Boost Accuracy Tensor-Coord: Algebraic Decomposition Enables Conflict-Free Multi-Agent LLM Planning Led by US, exits from gold ETFs continue for the 5th week in a row Domain-Guided Prompting Boosts Segment Anything Model for Seismic Interpretation Spokes Optimizes Diverse Pretraining Data Selection for LLMs, Boosting Performance Medical Heuristic Learning: LLM-Driven Framework for Interpretable Clinical Decision Rules Commodore Callback 8020 Brings Digital Detox With Modern Apps and Retro Design PreLort: Prefix-Nested LoRA Enables Federated Fine-Tuning Across Heterogeneous Hardware Ranks Research Shows 'Retrieve, Don't Retrain' Approach Cuts AI Model Adaptation Costs GAS-Leak-LLM: Genetic Algorithm Jailbreaks Black-Box LLMs, Exposing Safety Gaps New Generative Recommendation Model HoloRec Uses Hierarchical Encoding and Interleaved Reasoning to Boost Accuracy Tensor-Coord: Algebraic Decomposition Enables Conflict-Free Multi-Agent LLM Planning Led by US, exits from gold ETFs continue for the 5th week in a row Domain-Guided Prompting Boosts Segment Anything Model for Seismic Interpretation Spokes Optimizes Diverse Pretraining Data Selection for LLMs, Boosting Performance Medical Heuristic Learning: LLM-Driven Framework for Interpretable Clinical Decision Rules Commodore Callback 8020 Brings Digital Detox With Modern Apps and Retro Design PreLort: Prefix-Nested LoRA Enables Federated Fine-Tuning Across Heterogeneous Hardware Ranks Research Shows 'Retrieve, Don't Retrain' Approach Cuts AI Model Adaptation Costs
Home ›› Technology ›› Ai ›› Llms ›› AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

AdaMame, a two-stage training recipe for multilingual mathematical reasoning, addresses language collapse in large reasoning models. It adaptively aligns reasoning language to the query language without compromising accuracy, achieving Pareto-optimal performance across 12 languages.

iG
iGEN Editorial
June 16, 2026
AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

Large Reasoning Models (LRMs) demonstrate strong performance in English, but often fail to reason in the language of the query—a phenomenon known as language collapse. According to a paper titled "AdaMame: A Training Recipe for Adaptive Multilingual Reasoning" published on arXiv, existing reinforcement-learning-based fixes typically add a binary language fidelity reward to the accuracy objective, yet still incur trade-offs in accuracy, mid-trace code-switching, and excessive token usage. The research, authored by Ki, Dayeon, Kevin Duh, and Marine Carpuat, proposes a novel solution: AdaMame.

The Language Collapse Problem

The source reports that language collapse is a critical issue for enterprises deploying AI in multilingual environments—such as global trade, supply chain management, or customer support—where reasoning must occur in the query language. Previous RL-based approaches tried to enforce language fidelity via binary rewards, but these methods sacrificed accuracy, introduced code-switching during reasoning traces, and consumed excessive tokens. AdaMame addresses these limitations by adaptively aligning the reasoning language to the query language without compromising accuracy.

Inside AdaMame: Two-Stage Training

AdaMame consists of two stages, as described in the paper:

Stage Method Purpose
1 Supervised Fine-Tuning (SFT) Fine-tunes on naturally occurring reasoning traces across five languages to establish multilingual reasoning capability.
2 Reinforcement Learning (RL) Adapts Group Relative Policy Optimization (GRPO) with a query-conditioned alignment factor that grows progressively during training.

The second stage, called AdaMame-GRPO, is a modification of Group Relative Policy Optimization. According to the source, this adaptive factor guides the model to first explore diverse reasoning languages before exploiting reasoning in the query language. This progressive alignment avoids the trade-offs seen in prior methods.

Results and Performance

The research evaluated AdaMame across two benchmarks, two LRMs, and 12 languages. The source states that AdaMame-GRPO achieves Pareto-optimal performance across reasoning accuracy, language fidelity, and token efficiency over all baselines. The strongest gains were observed on out-of-domain, lower-resource languages—a promising result for global enterprises serving diverse linguistic markets.

Implications for Enterprise AI

For enterprise technology decision-makers, especially those in logistics and supply chain, the ability to reason accurately in the user's language is critical for document processing, customs declarations, and trade finance. AdaMame's approach—adaptively aligning reasoning language without accuracy loss—could enable more reliable multilingual AI systems. The use of a two-stage recipe combining SFT and GRPO provides a template for improving LRMs in production settings. While the source focuses on mathematical reasoning, the underlying principle of adaptive alignment is applicable to any domain requiring multilingual reasoning, including trade documentation and compliance.

In summary, AdaMame presents a training recipe that overcomes the accuracy–language fidelity trade-off, offering enterprises a path to deploy LLMs that truly reason in the query language. The research is publicly available on arXiv under a CC BY 4.0 license, encouraging further development and adoption.


Sources:

Keep Reading

Recommended Stories

ACC Method Compiles Agent Trajectories to Enhance Long-Context Reasoning in LLMs Technology

ACC Method Compiles Agent Trajectories to Enhance Long-Context Reasoning in LLMs

Researchers propose Agent Context Compilation (ACC), which converts agent trajectories from search, software engineering, and database tasks into long-context question-answer pairs. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR and 77.5 on GraphWalks, matching a model 8x larger, while preserving general capabilities.

June 16, 2026
A Theoretical Roadmap to Fuse Foundation Models and Knowledge Graphs Technology

A Theoretical Roadmap to Fuse Foundation Models and Knowledge Graphs

A new theoretical paper formalizes the 'Impedance Mismatch' between Foundation Models and Knowledge Graphs, arguing that current approaches like RAG are superficial. The authors propose a roadmap including Structured Residual Streams, Vector Symbolic Architectures, and Orthogonal Subspace Editing for true semantic fusion.

June 16, 2026
Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8% Technology

Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8%

A new research paper proposes Think-at-Hard (TaH), a looped transformer that selectively performs latent iterations only on tokens likely to be incorrect. By skipping iterations on 93% of tokens, TaH outperforms always-iterate models by 3.8-4.4% and single-iteration baselines by up to 6.8%, while requiring negligible extra parameters.

June 16, 2026
The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning Technology

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

A research paper identifies a 'Quality-Utility Paradox' in mathematical reasoning distillation: data refined by stronger models (Oracle) receives high reward scores but impairs small model performance compared to using the model's own self-generated traces. The authors propose Style-Aligned Refinement to preserve native reasoning patterns while incorporating logical corrections.

June 16, 2026