2604.06946v1 Apr 08, 2026 cs.SE

대규모 언어 모델의 LoRA 기반 미세 조정에 대한 경험적 연구: 자동 테스트 케이스 생성

An empirical study of LoRA-based fine-tuning of large language models for automated test case generation

Ke Yan

Citations: 37

h-index: 3

M. Moradi

Citations: 1

h-index: 1

David Colwell

Citations: 39

h-index: 3

Rhona Asgari

Citations: 121

h-index: 6

자연어 요구사항으로부터 자동 테스트 케이스를 생성하는 것은 요구사항의 모호성과 구조화되고 실행 가능한 테스트 결과물을 생성해야 하는 필요성 때문에 소프트웨어 공학 분야에서 여전히 어려운 문제입니다. 최근 LLM의 발전은 이러한 문제를 해결하는 데 유망한 가능성을 보여주었지만, 그 효과는 작업별 특화된 적응과 효율적인 미세 조정 전략에 크게 의존합니다. 본 논문에서는 요구사항 기반 테스트 케이스 생성에 대한 파라미터 효율적인 미세 조정, 특히 LoRA의 활용에 대한 종합적인 경험적 연구를 제시합니다. 오픈 소스 및 독점 모델을 포함한 다양한 LLM 패밀리에 대해 통일된 실험 파이프라인을 사용하여 평가를 수행했습니다. 본 연구는 LoRA의 주요 하이퍼파라미터, 즉 랭크, 스케일링 팩터, 드롭아웃이 성능에 미치는 영향을 체계적으로 분석합니다. 본 연구에서는 GPT-4o를 기반으로 생성된 테스트 케이스를 9가지 품질 차원에서 평가하는 자동화된 평가 프레임워크를 제안합니다. 실험 결과는 LoRA 기반 미세 조정이 모든 오픈 소스 모델의 성능을 크게 향상시킨다는 것을 보여주며, 그중 Ministral-8B가 가장 우수한 결과를 보였습니다. 또한, 미세 조정된 8B 오픈 소스 모델이 사전 미세 조정된 GPT-4.1 모델과 비교 가능한 성능을 달성할 수 있음을 보여주며, 이는 파라미터 효율적인 적응의 효과를 강조합니다. GPT-4.1 모델이 전체적으로 가장 높은 성능을 달성하지만, 미세 조정 후 독점 모델과 오픈 소스 모델 간의 성능 격차가 크게 줄어듭니다. 이러한 결과는 자동 테스트 생성에 대한 모델 선택, 미세 조정 전략 및 평가 방법에 대한 중요한 통찰력을 제공합니다. 특히, 잘 설계된 미세 조정 방법을 결합하면 비용 효율적이고 로컬에서 배포 가능한 오픈 소스 모델이 독점 시스템의 실행 가능한 대안이 될 수 있음을 보여줍니다.

Original Abstract

Automated test case generation from natural language requirements remains a challenging problem in software engineering due to the ambiguity of requirements and the need to produce structured, executable test artifacts. Recent advances in LLMs have shown promise in addressing this task; however, their effectiveness depends on task-specific adaptation and efficient fine-tuning strategies. In this paper, we present a comprehensive empirical study on the use of parameter-efficient fine-tuning, specifically LoRA, for requirement-based test case generation. We evaluate multiple LLM families, including open-source and proprietary models, under a unified experimental pipeline. The study systematically explores the impact of key LoRA hyperparameters, including rank, scaling factor, and dropout, on downstream performance. We propose an automated evaluation framework based on GPT-4o, which assesses generated test cases across nine quality dimensions. Experimental results demonstrate that LoRA-based fine-tuning significantly improves the performance of all open-source models, with Ministral-8B achieving the best results among them. Furthermore, we show that a fine-tuned 8B open-source model can achieve performance comparable to pre-fine-tuned GPT-4.1 models, highlighting the effectiveness of parameter-efficient adaptation. While GPT-4.1 models achieve the highest overall performance, the performance gap between proprietary and open-source models is substantially reduced after fine-tuning. These findings provide important insights into model selection, fine-tuning strategies, and evaluation methods for automated test generation. In particular, they demonstrate that cost-efficient, locally deployable open-source models can serve as viable alternatives to proprietary systems when combined with well-designed fine-tuning approaches.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!