AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

AdaMame, a two-stage training recipe for multilingual mathematical reasoning, addresses language collapse in large reasoning models. It adaptively aligns reasoning language to the query language without compromising accuracy, achieving Pareto-optimal performance across 12 languages.

iGEN Editorial

June 16, 2026

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

Large Reasoning Models (LRMs) demonstrate strong performance in English, but often fail to reason in the language of the query—a phenomenon known as language collapse. According to a paper titled "AdaMame: A Training Recipe for Adaptive Multilingual Reasoning" published on arXiv, existing reinforcement-learning-based fixes typically add a binary language fidelity reward to the accuracy objective, yet still incur trade-offs in accuracy, mid-trace code-switching, and excessive token usage. The research, authored by Ki, Dayeon, Kevin Duh, and Marine Carpuat, proposes a novel solution: AdaMame.

The Language Collapse Problem

The source reports that language collapse is a critical issue for enterprises deploying AI in multilingual environments—such as global trade, supply chain management, or customer support—where reasoning must occur in the query language. Previous RL-based approaches tried to enforce language fidelity via binary rewards, but these methods sacrificed accuracy, introduced code-switching during reasoning traces, and consumed excessive tokens. AdaMame addresses these limitations by adaptively aligning the reasoning language to the query language without compromising accuracy.

Inside AdaMame: Two-Stage Training

AdaMame consists of two stages, as described in the paper:

Stage	Method	Purpose
1	Supervised Fine-Tuning (SFT)	Fine-tunes on naturally occurring reasoning traces across five languages to establish multilingual reasoning capability.
2	Reinforcement Learning (RL)	Adapts Group Relative Policy Optimization (GRPO) with a query-conditioned alignment factor that grows progressively during training.

The second stage, called AdaMame-GRPO, is a modification of Group Relative Policy Optimization. According to the source, this adaptive factor guides the model to first explore diverse reasoning languages before exploiting reasoning in the query language. This progressive alignment avoids the trade-offs seen in prior methods.

Results and Performance

The research evaluated AdaMame across two benchmarks, two LRMs, and 12 languages. The source states that AdaMame-GRPO achieves Pareto-optimal performance across reasoning accuracy, language fidelity, and token efficiency over all baselines. The strongest gains were observed on out-of-domain, lower-resource languages—a promising result for global enterprises serving diverse linguistic markets.

Implications for Enterprise AI

For enterprise technology decision-makers, especially those in logistics and supply chain, the ability to reason accurately in the user's language is critical for document processing, customs declarations, and trade finance. AdaMame's approach—adaptively aligning reasoning language without accuracy loss—could enable more reliable multilingual AI systems. The use of a two-stage recipe combining SFT and GRPO provides a template for improving LRMs in production settings. While the source focuses on mathematical reasoning, the underlying principle of adaptive alignment is applicable to any domain requiring multilingual reasoning, including trade documentation and compliance.

In summary, AdaMame presents a training recipe that overcomes the accuracy–language fidelity trade-off, offering enterprises a path to deploy LLMs that truly reason in the query language. The research is publicly available on arXiv under a CC BY 4.0 license, encouraging further development and adoption.

Sources:

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

The Language Collapse Problem

Inside AdaMame: Two-Stage Training

Results and Performance

Implications for Enterprise AI

Recommended Stories

New Method LUCID Detects Hallucinations in LLM-Based Knowledge Graph Reasoning

New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress

ACC Method Compiles Agent Trajectories to Enhance Long-Context Reasoning in LLMs

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models