2604.05159v1 Apr 06, 2026 cs.SE

탐색 계획: LLM 테스트 생성을 위한 호기심 기반 계획

Planning to Explore: Curiosity-Driven Planning for LLM Test Generation

William Yang Wang

Citations: 5

h-index: 1

Alfonso Amayuelas

Citations: 515

h-index: 12

Firas Laakom

Citations: 247

h-index: 9

Wenyi Wang

Citations: 520

h-index: 8

Yifan Xu

Citations: 4

h-index: 1

Yuhui Wang

Citations: 2

h-index: 1

Jürgen Schmidhuber

Citations: 416

h-index: 2

Piotr Piekos

Citations: 70

h-index: 4

LLM을 활용한 코드 생성은 자연스럽게 코드 테스트 및 평가로 확장되었습니다. 코드 베이스의 규모와 복잡성이 증가함에 따라 자동화된 테스트 생성이 더욱 중요해지고 있습니다. 현재 LLM 기반 테스트 생성 방법은 즉각적인 커버리지 증가를 극대화하는 전략에 의존하며, 이는 탐욕적인 접근 방식으로, 깊은 분기점에 도달하려면 초기 단계에서 추가 설정이 필요하지만, 각 단계는 별도로 새로운 커버리지를 거의 제공하지 못합니다. 우리는 베이지안 탐색의 원리를 바탕으로 프로그램의 분기 구조를 알려지지 않은 환경으로 간주하고, LLM이 지금까지 발견한 내용을 나타내는 확률적 사후 분포인 진화하는 커버리지 맵을 사용합니다. 저희 방법인 CovQValue는 커버리지 맵을 LLM에 다시 입력하고, 다양한 후보 계획을 병렬로 생성하며, LLM이 추정한 Q 값을 사용하여 가장 유용한 계획을 선택합니다. 이를 통해 즉각적인 분기점 발견과 향후 도달 가능성 간의 균형을 추구합니다. 저희 방법은 TestGenEval Lite에서 탐욕적인 선택 방법을 능가하며, 세 가지 인기 있는 LLM에서 51-77% 더 높은 분기 커버리지를 달성하고 77-84%의 목표에서 우수한 성능을 보입니다. 또한, 반복적인 테스트 생성을 위한 벤치마크인 RepoExploreBench를 구축했으며, 여기서 40-74%의 성능을 달성했습니다. 이러한 결과는 LLM 기반 탐색을 위한 호기심 기반 계획 방법의 잠재력을 보여주며, 순차적인 상호 작용을 통해 프로그램 동작을 더욱 효과적으로 발견할 수 있도록 합니다.

Original Abstract

The use of LLMs for code generation has naturally extended to code testing and evaluation. As codebases grow in size and complexity, so does the need for automated test generation. Current approaches for LLM-based test generation rely on strategies that maximize immediate coverage gain, a greedy approach that plateaus on code where reaching deep branches requires setup steps that individually yield zero new coverage. Drawing on principles of Bayesian exploration, we treat the program's branch structure as an unknown environment, and an evolving coverage map as a proxy probabilistic posterior representing what the LLM has discovered so far. Our method, CovQValue, feeds the coverage map back to the LLM, generates diverse candidate plans in parallel, and selects the most informative plan by LLM-estimated Q-values, seeking actions that balance immediate branch discovery with future reachability. Our method outperforms greedy selection on TestGenEval Lite, achieving 51-77% higher branch coverage across three popular LLMs and winning on 77-84% of targets. In addition, we build a benchmark for iterative test generation, RepoExploreBench, where they achieve 40-74%. These results show the potential of curiosity-driven planning methods for LLM-based exploration, enabling more effective discovery of program behavior through sequential interaction

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!