2604.23712v1 Apr 26, 2026 cs.LG

OptProver: 형식적 증명을 통한 지속적인 학습을 통해 올림피아드와 최적화를 연결하다

OptProver: Bridging Olympiad and Optimization through Continual Training in Formal Theorem Proving

Kun Yuan

Citations: 30

h-index: 3

Gongrui Zhang

Citations: 256

h-index: 3

Zaiwen Wen

Citations: 130

h-index: 5

Chenyi Li

Citations: 29

h-index: 4

Yanchen Nie

Citations: 18

h-index: 1

Zhengyu Ming

Citations: 0

h-index: 0

최근 형식적 증명 분야의 발전은 주로 올림피아드 수준의 수학에 집중되어 왔으며, 학부 수준의 영역은 상대적으로 탐색이 부족했습니다. 머신러닝, 운영 연구 및 과학 컴퓨팅에 필수적인 최적화는 기존 증명기의 지원을 제대로 받지 못하고 있습니다. 최적화는 도메인 특화 형식(볼록성, 최적 조건, 알고리즘 분석 등)에 의존하기 때문에 상당한 분포 변화를 야기하며, 따라서 단순한 도메인 전송은 효과적이지 않습니다. 본 논문에서는 올림피아드에서 학부 수준의 최적화로 강력한 전송 성능을 보이는 OptProver라는 학습된 모델을 제시합니다. 강력한 올림피아드 수준의 증명기에서 시작하여, 저희는 두 가지 핵심 혁신을 통해 분포 변화를 완화합니다. 첫째, 전문가의 반복을 통해 최적화에 특화된 대규모 데이터 큐레이션을 수행합니다. 둘째, 퍼플렉시티 가중 최적화와 함께 유효하지만 진전이 없는 증명 단계를 페널티로 부여하는 메커니즘을 통합한 특수 선호도 학습 목표를 도입합니다. 이를 통해 분포 변화를 해결하는 것뿐만 아니라 효율적인 탐색 경로를 안내합니다. 엄격한 평가를 위해 Lean 4 기반의 최적화에 특화된 새로운 벤치마크를 구축했습니다. 이 벤치마크에서 OptProver는 비슷한 크기의 모델 중에서 최고 수준의 Pass@1 및 Pass@32 성능을 달성하는 동시에 일반적인 증명 작업에서도 경쟁력 있는 성능을 유지하며, 파괴적인 망각 없이 효과적인 도메인 전송을 보여줍니다.

Original Abstract

Recent advances in formal theorem proving have focused on Olympiad-level mathematics, leaving undergraduate domains largely unexplored. Optimization, fundamental to machine learning, operations research, and scientific computing, remains underserved by existing provers. Its reliance on domain-specific formalisms (convexity, optimality conditions, and algorithmic analysis) creates significant distribution shift, making naive domain transfer ineffective. We present OptProver, a trained model that achieves robust transfer from Olympiad to undergraduate optimization. Starting from a strong Olympiad-level prover, our pipeline mitigates distribution shift through two key innovations. First, we employ large-scale optimization-focused data curation via expert iteration. Second, we introduce a specialized preference learning objective that integrates perplexity-weighted optimization with a mechanism to penalize valid but non-progressing proof steps. This not only addresses distribution shifts but also guides the search toward efficient trajectories. To enable rigorous evaluation, we construct a novel benchmark in Lean 4 focused on optimization. On this benchmark, OptProver achieves state-of-the-art Pass@1 and Pass@32 among comparably sized models while maintaining competitive performance on general theorem-proving tasks, demonstrating effective domain transfer without catastrophic forgetting.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!