2606.06468v1 Jun 04, 2026 cs.AI

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Ziran Yang
Ziran Yang
Citations: 123
h-index: 3
Mengdi Wang
Mengdi Wang
Citations: 745
h-index: 13
Chi Jin
Chi Jin
Citations: 176
h-index: 4
J.H. Chung
J.H. Chung
Citations: 0
h-index: 0
Hongzhou Lin
Hongzhou Lin
Citations: 240
h-index: 3
Shange Tang
Shange Tang
Citations: 388
h-index: 7
Liam H. Fowl
Liam H. Fowl
Citations: 2,465
h-index: 20
Xingyu Dang
Xingyu Dang
Citations: 101
h-index: 3
Danqi Chen
Danqi Chen
Citations: 116
h-index: 1
Rohit Agarwal
Rohit Agarwal
Citations: 105
h-index: 4
Qishuo Yin
Qishuo Yin
Citations: 11
h-index: 2
Rodrigo Porto
Rodrigo Porto
Citations: 5
h-index: 1
Narutatsu Ri
Narutatsu Ri
Citations: 278
h-index: 6
Sanjeev Arora
Sanjeev Arora
Citations: 1,131
h-index: 8
Ziyang Cai
Ziyang Cai
Citations: 63
h-index: 4
Zihao Li
Zihao Li
Citations: 9
h-index: 2
Simon Park
Simon Park
Citations: 89
h-index: 4

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies. This blueprint is optionally guided by a natural language proof. Then, a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas in turn drive refinement of the global blueprint. This strategy contrasts with other mainstream approaches which use recursive lemma decomposition, and can inefficiently loop on dead-end strategies. Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on the harder problems, we additionally close the remaining two MiniF2F-test problems (reaching 100%), lift PutnamBench to 88.8% (597/672), and solve 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026. This represents state-of-the-art performance for an open-source pipeline at a price point up to 500x less than comparable open-source pipelines.

0 Citations
0 Influential
10 Altmetric
50.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!