2606.06468v1 Jun 04, 2026 cs.AI

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Ziran Yang

Citations: 123

h-index: 3

Mengdi Wang

Citations: 745

h-index: 13

Chi Jin

Citations: 176

h-index: 4

J.H. Chung

Citations: 0

h-index: 0

Hongzhou Lin

Citations: 240

h-index: 3

Shange Tang

Citations: 388

h-index: 7

Liam H. Fowl

Citations: 2,465

h-index: 20

Xingyu Dang

Citations: 101

h-index: 3

Danqi Chen

Citations: 116

h-index: 1

Rohit Agarwal

Citations: 105

h-index: 4

Qishuo Yin

Citations: 11

h-index: 2

Rodrigo Porto

Citations: 5

h-index: 1

Narutatsu Ri

Citations: 278

h-index: 6

Sanjeev Arora

Citations: 1,131

h-index: 8

Ziyang Cai

Citations: 63

h-index: 4

Zihao Li

Citations: 9

h-index: 2

Simon Park

Citations: 89

h-index: 4

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies. This blueprint is optionally guided by a natural language proof. Then, a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas in turn drive refinement of the global blueprint. This strategy contrasts with other mainstream approaches which use recursive lemma decomposition, and can inefficiently loop on dead-end strategies. Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on the harder problems, we additionally close the remaining two MiniF2F-test problems (reaching 100%), lift PutnamBench to 88.8% (597/672), and solve 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026. This represents state-of-the-art performance for an open-source pipeline at a price point up to 500x less than comparable open-source pipelines.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!