2603.29292v1 Mar 31, 2026 cs.SE

의미론적 엔트로피 및 행동적 합의를 통한 자체 개선 코드 생성

Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

Huan Zhang

Citations: 31

h-index: 1

Wei Cheng

Citations: 98

h-index: 3

Wei Hu

Citations: 87

h-index: 3

대규모 언어 모델(LLM)의 코드 생성 능력을 향상시키는 것은 일반적으로 지도 학습 미세 조정 또는 선호도 최적화를 통해 이루어지는데, 이 방법들은 강력한 지도 모델 또는 신뢰할 수 있는 테스트 유닛과 같은 비용이 많이 드는 외부 자원을 필요로 합니다. 그러나 실제 시나리오에서는 문제 설명 및 테스트 입력보다 참조 솔루션 및 테스트 오라클을 얻는 것이 훨씬 어렵습니다. 본 논문에서는 다음과 같은 어려운, 하지만 현실적인 질문을 다룹니다: 우수한 지도 모델과 테스트 오라클 없이 코드 언어 모델이 자체적으로 개선될 수 있는가? 이 질문에 답하기 위해, 우리는 두 가지 핵심 아이디어에 기반한 자체 개선 방법인 ConSelf를 제안합니다. 첫째, 우리는 문제 수준의 불확실성을 프로그램의 기능적 다양성을 평가하여 측정하는 새로운 지표인 코드 의미론적 엔트로피를 도입하여 학습하기 가장 쉬운 문제들로 구성된 교육 과정을 구축합니다. 둘째, 우리는 각 선호도 쌍을 행동적 합의에 따라 가중하여 자체 생성된 노이즈가 많은 지도에 미치는 영향을 완화하는 선호도 기반 미세 조정 방법인 합의 기반 직접 선호도 최적화(Con-DPO)를 제시합니다. 다양한 벤치마크 및 기반 LLM에 대한 실험 결과, ConSelf는 기준 모델보다 훨씬 뛰어난 성능을 보이며, 외부 지도 없이 코드 생성을 향상시키는 데 의미론적 엔트로피 기반 교육 과정 구성 및 합의 기반 최적화의 효과를 입증합니다.

Original Abstract

Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle? To answer this, we propose ConSelf, a self-improving approach built upon two key ideas. First, we introduce code semantic entropy, a novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors, enabling a curriculum construction with the most learnable problems. Second, we present consensus-driven direct preference optimization (Con-DPO), a preference-based fine-tuning method that weights each preference pair by its behavioral consensus, thereby mitigating the impact of noisy self-generated supervision. Experiments on various benchmarks and backbone LLMs demonstrate that ConSelf significantly outperforms baselines, validating the effectiveness of semantic entropy-based curriculum construction and consensus-driven optimization in improving code generation without external supervision.

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!