2603.19715v1 Mar 20, 2026 cs.AI

Stepwise: 신경-기호 증명 탐색을 위한 자동 시스템 검증

Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification

Zenan Li

Citations: 214

h-index: 7

Zhendong Su

Citations: 22

h-index: 3

Baoding He

Citations: 0

h-index: 0

Wei Sun

Citations: 2

h-index: 1

Yuan Yao

Citations: 79

h-index: 5

Taolue Chen

Citations: 202

h-index: 5

Xiaoxing Ma

Citations: 116

h-index: 5

대화형 정리 증명을 통한 형식 검증은 중요한 시스템의 정확성을 보장하는 데 점점 더 많이 사용되고 있지만, 대규모 증명 스크립트 작성은 여전히 매우 수동적인 작업이며 확장성을 제한합니다. 특히 수학적 추론 분야에서 큰 언어 모델(LLM)의 발전은 소프트웨어 검증에 LLM을 통합하는 데 더욱 큰 가능성을 제시합니다. 본 논문에서는 시스템 수준 검증 프로젝트를 위한 증명 탐색을 자동화하도록 설계된 신경-기호 증명 생성 프레임워크를 소개합니다. 이 프레임워크는 증명 상태를 기반으로 한 최적의 우선순위 트리 탐색을 수행하며, LLM에 다음 후보 증명 단계를 반복적으로 쿼리합니다. 신경망 측면에서는 증명 상태-단계 쌍 데이터 세트를 사용하여 LLM을 미세 조정하고, 기호 측면에서는 거부된 단계를 수정하고, 증명 상태를 필터링하고 순위를 매기며, 탐색 진행이 중단될 때 하위 목표를 자동으로 해결하기 위해 다양한 ITP 도구를 통합합니다. 이러한 시너지 효과는 데이터 효율적인 LLM 적응을 가능하게 하며, 의미론 정보를 활용하여 탐색 공간을 효율적으로 줄입니다. 우리는 이 프레임워크를 새로운 Isabelle REPL에 구현하여 세분화된 증명 상태와 자동화 도구를 제공하고, FVEL seL4 벤치마크 및 추가 Isabelle 개발 프로젝트에서 평가했습니다. seL4의 경우, 시스템은 최대 77.6%의 정리를 증명하여 이전의 LLM 기반 접근 방식 및 독립 실행형 Sledgehammer보다 훨씬 뛰어난 성능을 보였으며, 훨씬 더 많은 다단계 증명을 해결했습니다. 추가 벤치마크 결과는 강력한 일반화 능력을 보여주며, 확장 가능한 자동 소프트웨어 검증을 위한 실현 가능한 경로를 제시합니다.

Original Abstract

Formal verification via interactive theorem proving is increasingly used to ensure the correctness of critical systems, yet constructing large proof scripts remains highly manual and limits scalability. Advances in large language models (LLMs), especially in mathematical reasoning, make their integration into software verification increasingly promising. This paper introduces a neuro-symbolic proof generation framework designed to automate proof search for systems-level verification projects. The framework performs a best-first tree search over proof states, repeatedly querying an LLM for the next candidate proof step. On the neural side, we fine-tune LLMs using datasets of proof state-step pairs; on the symbolic side, we incorporate a range of ITP tools to repair rejected steps, filter and rank proof states, and automatically discharge subgoals when search progress stalls. This synergy enables data-efficient LLM adaptation and semantics-informed pruning of the search space. We implement the framework on a new Isabelle REPL that exposes fine-grained proof states and automation tools, and evaluate it on the FVEL seL4 benchmark and additional Isabelle developments. On seL4, the system proves up to 77.6\% of the theorems, substantially surpassing previous LLM-based approaches and standalone Sledgehammer, while solving significantly more multi-step proofs. Results across further benchmarks demonstrate strong generalization, indicating a viable path toward scalable automated software verification.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!