2604.04898v1 Apr 06, 2026 cs.AI

QED-Nano: 작은 모델을 활용하여 어려운 정리를 증명하도록 학습시키기

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Amrith Rajagopal Setlur

Carnegie Mellon University

Citations: 1,102

h-index: 14

Jia Li

Citations: 168

h-index: 4

Yuxiao Qu

Citations: 349

h-index: 6

Aviral Kumar

Citations: 399

h-index: 7

LM-Provers

Citations: 2

h-index: 1

Jasper Dekoninck

Citations: 515

h-index: 10

E. Beeching

Citations: 1,068

h-index: 7

Ian Wu

Citations: 82

h-index: 4

Lewis Tunstall

Citations: 718

h-index: 4

최근에는 독점적인 인공지능 시스템들이 복잡한 증명 기반 문제에서 뛰어난 성능을 보여주며, 2025년 국제수학올림피아드(IMO)에서 최고 수준의 성과를 기록했습니다. 그러나 이러한 시스템 뒤에 숨겨진 학습 과정은 대부분 공개되지 않았으며, 대규모의 "내부" 모델과 구조에 대한 의존성으로 인해 운영 비용이 많이 들고, 재현하기 어렵고, 연구하거나 개선하기 어렵습니다. 이러한 상황에서 다음과 같은 핵심 질문이 제기됩니다. 작은 규모의 공개 모델도 어려운 올림피아드 수준의 수학 문제에 대해 경쟁력 있는 추론 성능을 달성하도록 학습될 수 있는가? 본 논문에서는 QED-Nano라는 40억 파라미터 모델을 구축하여 이 질문에 답합니다. QED-Nano는 올림피아드 수준의 증명을 위해 추가 학습되었습니다. 저희의 학습 방법은 세 단계로 구성됩니다. (1) DeepSeek-Math-V2에서 지식을 추출하여 좋은 증명 작성 스타일을 학습시키는 지도 학습 미세 조정, (2) 채점 기준 기반의 보상을 사용하는 강화 학습(RL), (3) 추론 캐시를 활용하여 강화 학습을 확장하는 방식으로, 긴 증명을 반복적인 요약 및 개선 사이클로 분해하여 추론 성능을 향상시킵니다. QED-Nano는 Nomos-1 및 GPT-OSS-120B와 같은 훨씬 더 큰 규모의 공개 모델보다 증명 생성 성능이 뛰어나며, Gemini 3 Pro와 같은 독점 모델의 성능에 근접하면서도 추론 비용은 훨씬 적게 듭니다. 공개 수학적 추론에 대한 추가 연구를 지원하기 위해, QED-Nano 모델(QED-Nano 및 QED-Nano-SFT), FineProofs-SFT 및 FineProofs-RL 데이터셋, 그리고 학습 및 평가 코드를 포함한 전체 QED-Nano 파이프라인을 공개합니다.

Original Abstract

Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" models and scaffolds makes them expensive to run, difficult to reproduce, and hard to study or improve upon. This raises a central question: can small, open models also be trained to achieve competitive reasoning performance on difficult Olympiad-level math? In this paper, we answer this question by building QED-Nano, a 4B model post-trained for Olympiad-level proofs. Our training recipe has three stages: (1) supervised fine-tuning to imbue good proof-writing styles by distilling from DeepSeek-Math-V2, (2) reinforcement learning (RL) with rubric-based rewards, and (3) expanding RL with a reasoning cache, which decomposes long proofs into iterative summarize-and-refine cycles and enables stronger test-time reasoning. QED-Nano surpasses the proof-generation performance of much larger open models, including Nomos-1 and GPT-OSS-120B, and approaches the performance of proprietary models like Gemini 3 Pro, at a fraction of the inference cost. To support further research on open mathematical reasoning, we release the full QED-Nano pipeline, including the QED-Nano and QED-Nano-SFT models, the FineProofs-SFT and FineProofs-RL datasets, and the training and evaluation code.

2 Citations

0 Influential

7 Altmetric

37.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!