2603.19828v1 Mar 20, 2026 cs.AI

FormalEvolve: 신경-기호 진화 탐색을 이용한 다양하고 검증 가능한 자동 형식화

FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse and Prover-Effective Autoformalization

Jing Liu

Citations: 114

h-index: 5

Haijian Lu

Citations: 192

h-index: 7

Wei Wang

Citations: 97

h-index: 2

자동 형식화는 자연어 수학을 컴파일 가능하고 기계적으로 검증 가능한 형태로 변환하는 것을 목표로 합니다. 그러나 의미적 일관성은 검증 효율성을 보장하지 않습니다. 의미적으로 일관된 형식화 결과라도 증명 탐색 비용과 성공률에서 크게 다를 수 있습니다. 본 연구에서는 자동 형식화를 의미적으로 일관된 후보 집합을 찾는 예산 기반의 테스트 시간 탐색 문제로 정의하고, 컴파일 게이트를 사용하는 신경-기호 진화 프레임워크인 FormalEvolve를 제안합니다. FormalEvolve는 LLM 기반의 돌연변이와 교차, 그리고 제한적인 패치 수정을 통해 다양한 후보를 생성하며, 기호 추상 구문 트리(AST) 재작성 연산을 통해 구조적 다양성을 더욱 강화합니다. CombiBench 및 ProofNet 데이터셋에서, 엄격한 생성 호출 예산 T = 100 하에서 FormalEvolve는 의미적 적중률(SH@100)을 각각 58.0% 및 84.9%로 달성했으며, 문제 간 의미적 성공 분포의 집중도를 낮춰 Gini 계수를 감소시켰습니다. 또한, 고정된 검증 예산 하에서 FormalEvolve는 CombiBench 데이터셋에서 하위 검증 성능을 향상시켰습니다. 코드 배포 예정입니다.

Original Abstract

Autoformalization aims to translate natural-language mathematics into compilable, machine-checkable statements. However, semantic consistency does not imply prover effectiveness: even semantically consistent formalizations can differ substantially in proof-search cost and success rate. In this work, we formulate autoformalization as a budgeted, test-time search for semantically consistent repertoires, and propose FormalEvolve, a compilation-gated neuro-symbolic evolutionary framework. FormalEvolve generates diverse candidates via LLM-driven mutation and crossover with bounded patch repair, while symbolic Abstract Syntax Tree (AST) rewrite operations further inject structural diversity. On CombiBench and ProofNet, under a strict generator-call budget of T = 100, FormalEvolve reaches semantic hit rates (SH@100) of 58.0% and 84.9%, and reduces cross-problem concentration of semantic successes(lower Gini). Under a fixed prover budget, FormalEvolve also improves downstream proving performance on CombiBench. Code will be released publicly.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!