2602.13695v1 Feb 14, 2026 cs.AI

경량 자동화 AI 파이프라인이 연구 수준의 수학 문제를 해결할 수 있는가?

Can a Lightweight Automated AI Pipeline Solve Research-Level Mathematical Problems?

Wei Zhao

Citations: 100

h-index: 4

Yanzhi Zhang

Citations: 41

h-index: 3

Haoxiang Guan

Citations: 43

h-index: 3

Jiyan He

Citations: 45

h-index: 3

Lv Meng

Citations: 0

h-index: 0

대규모 언어 모델(LLM)은 최근 엄밀한 수학적 증명을 생성하는 데 있어 놀라운 성공을 거두었으며, '수학을 위한 AI(AI for Math)'는 활발한 연구 분야로 부상하고 있습니다. 이러한 모델들이 국제수학올림피아드와 같은 경시대회 수준의 벤치마크를 마스터하고 자동 정형화를 통해 연구 응용 분야에서 가능성을 보여주고 있지만, 연구 문제 해결을 위해 자연어 기반의 경량 파이프라인을 활용하는 시도는 아직 충분히 탐구되지 않았습니다. 본 연구에서 우리는 차세대 모델(예: Gemini 3 Pro, GPT-5.2 Pro)이 인용 기반 검증에 최적화된 효율적인 자동화 파이프라인에 통합될 경우, 정교한 연구급 문제들을 해결할 수 있음을 입증합니다. 우리는 이 파이프라인을 두 가지 새로운 데이터셋, 즉 (1) 저명한 수학자들이 제안한 ICCM 문제 세트(S.-T. Yau 대학생 수학 경시대회와 유사)와 (2) 이전에 출판되지 않은 연구 질문들로 구성된 'First Proof' 문제 세트에 대해 평가했습니다. 우리의 파이프라인은 첫 두 ICCM 세트와 'First Proof' 세트의 모든 문제에 대해 후보 증명을 생성했습니다. 첫 두 ICCM 세트와 'First Proof' 세트의 문제 4번에 대한 풀이는 우리 팀에 의해 완전히 검증되었습니다. 생성된 모든 증명은 공식 기관에 제출되었으며, 생성된 결과는 공개되어 있습니다. 우리는 추후 전체 파이프라인 방법론을 오픈 소스로 공개할 계획입니다.

Original Abstract

Large language models (LLMs) have recently achieved remarkable success in generating rigorous mathematical proofs, with "AI for Math" emerging as a vibrant field of research. While these models have mastered competition-level benchmarks like the International Mathematical Olympiad and show promise in research applications through auto-formalization, their deployment via lightweight, natural-language pipelines for research problems remains underexplored. In this work, we demonstrate that next-generation models (e.g., Gemini 3 Pro, GPT-5.2 Pro), when integrated into a streamlined automated pipeline optimized for citation-based verification, can solve sophisticated research-grade problems. We evaluate our pipeline on two novel datasets: (1) the ICCM problem sets (comparable to the S.-T. Yau College Student Mathematics Contest) proposed by leading mathematicians, and (2) the "First Proof" problem set, consisting of previously unpublished research questions. Our pipeline generated candidate proofs for all problems in the first two ICCM sets and the "First Proof" set. The solutions for the first two ICCM sets and Problem 4 of the "First Proof" set have been fully verified by our team. All generated proofs have been submitted to the official organization, and our generated results are publicly available. We plan to open-source the complete pipeline methodology in due course.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!