2601.14027v1 Jan 20, 2026 cs.AI

Numina-Lean-Agent: 정형 수학을 위한 개방형 범용 에이전트 추론 시스템

Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

Junqi Liu

Citations: 47

h-index: 2

Zihao Zhou

Xi’an Jiaotong-liverpool University

Citations: 179

h-index: 6

Zekai Zhu

Citations: 179

h-index: 2

Marco Dos Santos

University of Cambridge

Citations: 797

h-index: 5

Ran Wang

Citations: 216

h-index: 4

Yunzhou Xie

Citations: 46

h-index: 2

Qiufeng Wang

Citations: 22

h-index: 1

Lihong Zhi

Citations: 51

h-index: 3

Jia Li

Citations: 224

h-index: 4

Jiawei Liu

Citations: 181

h-index: 2

Weikun He

Citations: 26

h-index: 2

Wenda Li

Citations: 368

h-index: 6

Junqiao Zhao

Citations: 44

h-index: 3

에이전트 시스템은 최근 정형 정리 증명 분야의 주류 패러다임으로 자리 잡았으며, 다수의 모델과 도구를 조율하여 강력한 성능을 달성하고 있습니다. 그러나 기존 접근 방식은 종종 특정 작업에 특화된 파이프라인이나 별도로 훈련된 정형 증명기에 의존하여 유연성과 재현성에 한계가 있습니다. 본 논문에서는 범용 코딩 에이전트를 정형 수학 추론기로 직접 활용하는 패러다임을 제안합니다. 이 패러다임은 (1) 범용 코딩 에이전트가 증명 이외의 다양한 추론 작업에 대해 자연스러운 인터페이스를 제공하고, (2) 별도의 훈련 없이 기반 모델(base model)을 교체하는 것만으로 성능을 향상시킬 수 있으며, (3) MCP를 통해 복잡한 설계 없이도 유연한 확장과 전문 도구의 자율적 호출이 가능하다는 점에 착안했습니다. 이러한 패러다임을 바탕으로 우리는 Claude Code와 Numina-Lean-MCP를 결합한 Numina-Lean-Agent를 소개합니다. 이 에이전트는 Lean과의 자율적인 상호작용, 관련 정리 검색, 비정형 증명(informal proving) 및 보조 추론 도구 활용이 가능합니다. Claude Opus 4.5를 기반 모델로 사용한 Numina-Lean-Agent는 2025년 퍼트남(Putnam) 대회의 모든 문제(12/12)를 해결하여 최고 수준의 비공개 시스템과 동등한 성능을 보였습니다. 벤치마크 평가를 넘어, 수학자들과의 상호작용을 통해 브라스캠프-리브(Brascamp-Lieb) 정리를 성공적으로 정형화함으로써 그 범용성을 입증했습니다. Numina-Lean-Agent와 모든 솔루션은 https://github.com/project-numina/numina-lean-agent 에서 공개됩니다.

Original Abstract

Agentic systems have recently become the dominant paradigm for formal theorem proving, achieving strong performance by coordinating multiple models and tools. However, existing approaches often rely on task-specific pipelines and trained formal provers, limiting their flexibility and reproducibility. In this paper, we propose the paradigm that directly uses a general coding agent as a formal math reasoner. This paradigm is motivated by (1) A general coding agent provides a natural interface for diverse reasoning tasks beyond proving, (2) Performance can be improved by simply replacing the underlying base model, without training, and (3) MCP enables flexible extension and autonomous calling of specialized tools, avoiding complex design. Based on this paradigm, we introduce Numina-Lean-Agent, which combines Claude Code with Numina-Lean-MCP to enable autonomous interaction with Lean, retrieval of relevant theorems, informal proving and auxiliary reasoning tools. Using Claude Opus 4.5 as the base model, Numina-Lean-Agent solves all problems in Putnam 2025 (12 / 12), matching the best closed-source system. Beyond benchmark evaluation, we further demonstrate its generality by interacting with mathematicians to successfully formalize the Brascamp-Lieb theorem. We release Numina-Lean-Agent and all solutions at https://github.com/project-numina/numina-lean-agent.

24 Citations

4 Influential

50.587264482324 Altmetric

284.9 Score

Original PDF

248

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!