2606.16140v1 Jun 15, 2026 cs.AI

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Xingran Zhou
Xingran Zhou
Citations: 80
h-index: 2
Yirong Chen
Yirong Chen
Citations: 73
h-index: 4
Wei Wang
Wei Wang
Citations: 382
h-index: 7
Sen Xu
Sen Xu
Citations: 6
h-index: 1
Shixiaoqi Liu
Shixiaoqi Liu
Citations: 0
h-index: 0
Jixin Min
Jixin Min
Citations: 6
h-index: 1
Yingwei Dai
Yingwei Dai
Citations: 48
h-index: 3
Zhibin Yin
Zhibin Yin
Citations: 15
h-index: 2
Junlin Zhang
Junlin Zhang
Citations: 40
h-index: 4

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. Experimental evaluations demonstrate that VibeThinker-3B achieves frontier-level performance on highly demanding verifiable tasks. Specifically, it attains a score of 94.3 on AIME26 (improving to 97.1 with claim-level test-time scaling), an 80.2 Pass@1 on LiveCodeBench v6, and exhibits strong out-of-distribution generalization with a 96.1\% acceptance rate on recent unseen LeetCode contests. This effectively places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2, GLM-5, and Gemini 3 Pro. Furthermore, a score of 93.4 on IFEval confirms that this extreme reasoning enhancement does not compromise strict instruction controllability. Extending our previous 1.5B work, these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios. This perspective suggests that compact models are not merely deployment-efficient substitutes, but a complementary path toward frontier-level performance in parameter-dense capability regimes.

0 Citations
0 Influential
3.5 Altmetric
17.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!