2606.13473v1 Jun 11, 2026 cs.LG

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Jiacheng Chen

Citations: 455

h-index: 4

Weiyu Cheng

Citations: 394

h-index: 4

Zehan Li

Citations: 5

h-index: 2

Binyan Jiang

Citations: 262

h-index: 4

Han Ding

Citations: 1,242

h-index: 5

Pengyu Zhao

Citations: 74

h-index: 3

Jingyang Li

Citations: 32

h-index: 2

F. Yu

Citations: 15

h-index: 2

Shunkai Zhang

Citations: 7

h-index: 1

Zhengmao Zhu

Citations: 245

h-index: 3

Xinyu Zhang

Citations: 16

h-index: 2

Yanmohan Wang

Citations: 0

h-index: 0

Lin Li

Citations: 164

h-index: 2

Tiancheng Qin

Citations: 56

h-index: 4

Qin Wang

Citations: 6

h-index: 1

Tianle Li

Citations: 25

h-index: 3

Jin-Feng Zhu

Citations: 467

h-index: 12

Chenyu Du

Citations: 23

h-index: 3

Zijian Song

Citations: 195

h-index: 3

Jiayuan Song

Citations: 412

h-index: 4

Zhi Zhang

Citations: 141

h-index: 4

Yunan Huang

Citations: 376

h-index: 4

Yuntao Cheng

Citations: 19

h-index: 1

We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!