Large Reasoning Models (LRMs) demonstrate strong performance in English, but often fail to reason in the language of the query—a phenomenon known as language collapse. According to a paper titled "AdaMame: A Training Recipe for Adaptive Multilingual Reasoning" published on arXiv, existing reinforcement-learning-based fixes typically add a binary language fidelity reward to the accuracy objective, yet still incur trade-offs in accuracy, mid-trace code-switching, and excessive token usage. The research, authored by Ki, Dayeon, Kevin Duh, and Marine Carpuat, proposes a novel solution: AdaMame.
The Language Collapse Problem
The source reports that language collapse is a critical issue for enterprises deploying AI in multilingual environments—such as global trade, supply chain management, or customer support—where reasoning must occur in the query language. Previous RL-based approaches tried to enforce language fidelity via binary rewards, but these methods sacrificed accuracy, introduced code-switching during reasoning traces, and consumed excessive tokens. AdaMame addresses these limitations by adaptively aligning the reasoning language to the query language without compromising accuracy.
Inside AdaMame: Two-Stage Training
AdaMame consists of two stages, as described in the paper:
| Stage | Method | Purpose |
|---|---|---|
| 1 | Supervised Fine-Tuning (SFT) | Fine-tunes on naturally occurring reasoning traces across five languages to establish multilingual reasoning capability. |
| 2 | Reinforcement Learning (RL) | Adapts Group Relative Policy Optimization (GRPO) with a query-conditioned alignment factor that grows progressively during training. |
The second stage, called AdaMame-GRPO, is a modification of Group Relative Policy Optimization. According to the source, this adaptive factor guides the model to first explore diverse reasoning languages before exploiting reasoning in the query language. This progressive alignment avoids the trade-offs seen in prior methods.
Results and Performance
The research evaluated AdaMame across two benchmarks, two LRMs, and 12 languages. The source states that AdaMame-GRPO achieves Pareto-optimal performance across reasoning accuracy, language fidelity, and token efficiency over all baselines. The strongest gains were observed on out-of-domain, lower-resource languages—a promising result for global enterprises serving diverse linguistic markets.
Implications for Enterprise AI
For enterprise technology decision-makers, especially those in logistics and supply chain, the ability to reason accurately in the user's language is critical for document processing, customs declarations, and trade finance. AdaMame's approach—adaptively aligning reasoning language without accuracy loss—could enable more reliable multilingual AI systems. The use of a two-stage recipe combining SFT and GRPO provides a template for improving LRMs in production settings. While the source focuses on mathematical reasoning, the underlying principle of adaptive alignment is applicable to any domain requiring multilingual reasoning, including trade documentation and compliance.
In summary, AdaMame presents a training recipe that overcomes the accuracy–language fidelity trade-off, offering enterprises a path to deploy LLMs that truly reason in the query language. The research is publicly available on arXiv under a CC BY 4.0 license, encouraging further development and adoption.