2603.29088v1 Mar 31, 2026 cs.SE

WybeCoder: 검증된 명령형 코드 자동 생성

WybeCoder: Verified Imperative Code Generation

Fabian Gloeckle

Citations: 3,772

h-index: 7

Gabriel Synnaeve

Citations: 63,500

h-index: 58

Amaury Hayat

Citations: 560

h-index: 9

Peter O'Hearn

Citations: 70

h-index: 3

Mantas Baksys

Citations: 360

h-index: 4

Darius Feher

Citations: 14

h-index: 1

Kunhao Zheng

Citations: 131

h-index: 5

Sean B. Holden

Citations: 46

h-index: 3

최근 대규모 언어 모델(LLM)의 발전은 자동 코드 생성 및 형식적인 정리 증명을 향상시켰지만, 소프트웨어 검증은 동일한 수준의 발전을 이루지 못했습니다. 이러한 격차를 해소하기 위해, 우리는 증명-중심 개발을 가능하게 하는 에이전트 기반 코드 검증 프레임워크인 WybeCoder를 제안합니다. WybeCoder는 자동 검증 조건 생성 및 SMT 솔버와 Lean 기반의 대화형 증명을 결합한 최근 프레임워크를 기반으로 하며, 코드, 불변 조건 및 증명이 함께 발전하도록 설계되었습니다. 체계적인 평가를 위해, 함수형 검증 벤치마크인 Verina와 Clever를 동등한 명령형 코드 사양으로 변환했습니다. 복잡한 알고리즘인 힙 정렬(Heapsort)의 경우, 우리의 접근 방식을 확장하여 수십 개의 유효한 불변 조건을 합성하고 수십 개의 하위 목표를 처리함으로써, 이전 연구에서 보고된 정체 현상을 극복하고 수백 줄의 검증된 코드를 생성하는 데 있어 일관된 성능 향상을 관찰했습니다. 우리의 최적 시스템은 적당한 컴퓨팅 자원 하에서 Verina 작업의 74%와 Clever 작업의 62%를 해결하며, 이전 평가를 크게 능가하고 검증된 대규모 명령형 코드 데이터 세트의 자동 구축을 위한 길을 열었습니다.

Original Abstract

Recent progress in large language models (LLMs) has advanced automatic code generation and formal theorem proving, yet software verification has not seen the same improvement. To address this gap, we propose WybeCoder, an agentic code verification framework that enables prove-as-you-generate development where code, invariants, and proofs co-evolve. It builds on a recent framework that combines automatic verification condition generation and SMT solvers with interactive proofs in Lean. To enable systematic evaluation, we translate two benchmarks for functional verification in Lean, Verina and Clever, to equivalent imperative code specifications. On complex algorithms such as Heapsort, we observe consistent performance improvements by scaling our approach, synthesizing dozens of valid invariants and dispatching of dozens of subgoals, resulting in hundreds of lines of verified code, overcoming plateaus reported in previous works. Our best system solves 74% of Verina tasks and 62% of Clever tasks at moderate compute budgets, significantly surpassing previous evaluations and paving a path to automated construction of large-scale datasets of verified imperative code.

1 Citations

0 Influential

29 Altmetric

146.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!