A compact language model with only 3 billion parameters is matching the verifiable reasoning performance of models many times its size, according to a technical report published on arXiv. The model, named VibeThinker-3B, was developed to explore how far reasoning capabilities can be pushed within a strictly small-model regime.
Model Architecture and Training Pipeline
VibeThinker-3B is a dense model with 3 billion parameters, built upon the Spectrum-to-Signal post-training paradigm. The paper describes an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. This combination systematically enhances the model's reasoning ability while keeping the parameter count low.
Benchmark Performance and Comparative Analysis
Experimental evaluations reported in the paper show frontier-level performance on demanding verifiable tasks. The table below summarizes key scores:
| Benchmark | Score | Notes |
|---|---|---|
| AIME26 | 94.3 | Improves to 97.1 with claim-level test-time scaling |
| LiveCodeBench v6 | 80.2 Pass@1 | - |
| LeetCode unseen contests | 96.1% acceptance rate | Out-of-distribution generalization |
| IFEval | 93.4 | Measures instruction controllability |
The paper states that these results place VibeThinker-3B "in the performance band of first-tier reasoning systems," matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2, GLM-5, and Gemini 3 Pro. The IFEval score of 93.4 confirms that this extreme reasoning enhancement does not compromise strict instruction controllability.
The Parametric Compression-Coverage Hypothesis
Extending the authors' previous work on a 1.5B model, the report introduces the Parametric Compression-Coverage Hypothesis. This view posits that verifiable reasoning can be compressed into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios. The paper suggests that compact models are not merely deployment-efficient substitutes, but a complementary path toward frontier-level performance in parameter-dense capability regimes.
The authors of the report are Xu, Sen; Liu, Shixi; Wang, Wei; Min, Jixin; Dai, Yingwei; Zhibin; Chen, Yirong; Zhou, Xin; and Zhang, Junlin. The full paper is available under a CC Zero license on arXiv with identifier 2606.16140.