2604.19000v1 Apr 21, 2026 cs.LG

분해, 구조화, 그리고 수정: 연산자 트리 기반 자동 형식화 신경-기호 프레임워크

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

Xiaoyang Liu

Citations: 26

h-index: 3

Zineng Dong

Citations: 5

h-index: 1

Yi Bai

Citations: 1

h-index: 1

Yantao Li

Citations: 597

h-index: 2

Yuntian Liu

Shanghai Jiao Tong University

Citations: 26

h-index: 3

Tao Luo

Citations: 26

h-index: 3

자동 형식화는 인간의 수학과 형식 수학 사이의 중요한 연결 고리 역할을 하며, 자연어 문제를 형식 언어로 번역합니다. 기존 연구들은 주로 데이터 합성 및 다양한 학습 패러다임을 통해 엔드-투-엔드 대규모 언어 모델(LLM)을 최적화하는 데 초점을 맞추었지만, 일반적으로 형식 코드를 평면적인 시퀀스로 취급하여 수학적 진술에 내재된 계층적 논리를 간과합니다. 본 연구에서는 자동 형식화를 모듈화된 파이프라인으로 재구성하는 신경-기호 프레임워크인 Decompose, Structure, and Repair (DSR)를 소개합니다. DSR은 진술을 논리적 구성 요소로 분해하고 이를 구조화된 연산자 트리에 매핑하며, 이 토폴로지 기반의 청사진을 활용하여 하위 트리 정제를 통해 오류를 정확하게 위치시키고 수정합니다. 또한, 우리는 표준 교재에서 선별된 156개의 학부 및 대학원 수준의 정리들을 포함하는 벤치마크인 PRIME을 소개하며, 이 데이터셋은 Lean 4로 전문적으로 주석 처리되었습니다. 실험 결과는 DSR이 새로운 최고 성능을 달성하며, 동일한 계산 자원을 사용할 때 기존 모델들을 지속적으로 능가한다는 것을 보여줍니다. 데이터셋, 모델, 코드는 곧 공개될 예정입니다.

Original Abstract

Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language problems into formal language. While prior works have focused on data synthesis and diverse training paradigms to optimize end-to-end Large Language Models (LLMs), they typically treat formal code as flat sequences, neglecting the hierarchical logic inherent in mathematical statements. In this work, we introduce Decompose, Structure, and Repair (DSR), a neuro-symbolic framework that restructures autoformalization into a modular pipeline. DSR decomposes statements into logical components and maps them to structured operator trees, leveraging this topological blueprint to precisely localize and repair errors via sub-tree refinement. Furthermore, we introduce PRIME, a benchmark of 156 undergraduate and graduate-level theorems selected from canonical textbooks and expertly annotated in Lean 4. Experimental results demonstrate that DSR establishes a new state-of-the-art, consistently outperforming baselines under equivalent computational budgets. The datasets, model, and code will be released to the public soon.

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!