2603.07927v1 Mar 09, 2026 cs.SE

SWE-Fuse: 문제 없는 경로 학습과 엔트로피 기반 RLVR 훈련을 통해 소프트웨어 에이전트의 성능 향상

SWE-Fuse: Empowering Software Agents via Issue-free Trajectory Learning and Entropy-aware RLVR Training

Binbin Chen

Citations: 27

h-index: 3

Peng Di

Citations: 55

h-index: 5

Xinyuan Wen

Citations: 0

h-index: 0

Haoxuan Lan

Citations: 95

h-index: 2

Hang Yu

Citations: 14

h-index: 3

Cuiyun Gao

Citations: 264

h-index: 8

대규모 언어 모델(LLM)은 소프트웨어 공학 분야에 혁신을 가져왔습니다. 최근에는 실제 소프트웨어 문제 해결 작업을 수행하기 위한 다양한 LLM 기반 에이전트들이 개발되었습니다. 이러한 에이전트들은 최첨단 성능을 보여주지만, 중요한 과제에 직면하고 있습니다. 바로 **충분하지 않은 고품질 문제 설명**입니다. 실제 데이터셋은 종종 문제 설명과 그에 따른 해결책 간의 불일치를 보여주며, 이는 자동 에이전트에 노이즈와 모호성을 야기하여 문제 해결 능력을 제한합니다. 본 연구에서는 **SWE-Fuse**라는 문제 설명 정보를 활용한 훈련 프레임워크를 제안합니다. SWE-Fuse는 문제 설명 기반 샘플과 문제 설명이 없는 샘플을 결합하여 SWE 에이전트를 훈련합니다. SWE-Fuse는 두 가지 핵심 모듈로 구성됩니다. (1) 잠재적으로 잘못된 문제 설명을 완화하고 모델이 단계별 디버깅 프로세스를 학습할 수 있도록 하는 문제 없는 경로 학습 모듈, 그리고 (2) 엔트로피 기반 RLVR 훈련 모듈입니다. 이 모듈은 엔트로피에 따라 훈련 과정을 적응적으로 조정하며, 높은 엔트로피에서는 탐색을 장려하기 위해 느슨한 클리핑을 적용하고, 낮은 엔트로피에서는 훈련 안정성을 확보하기 위해 더 엄격한 클리핑을 적용합니다. 우리는 널리 사용되는 SWE-bench Verified 벤치마크를 사용하여 SWE-Fuse의 효과성을 평가했습니다. 그 결과, SWE-Fuse는 실제 소프트웨어 문제를 해결하는 데 있어 최고 성능의 8B 및 32B 모델을 각각 43.0% 및 60.2% 더 뛰어넘는 성능을 보였습니다. 또한, SWE-Fuse를 테스트 시 스케일링(TTS)과 통합하면 성능을 더욱 향상시킬 수 있으며, 8B 및 32B 모델에서 TTS@8 환경에서 각각 49.8% 및 65.2%의 문제 해결률을 달성했습니다.

Original Abstract

Large language models (LLMs) have transformed the software engineering landscape. Recently, numerous LLM-based agents have been developed to address real-world software issue fixing tasks. Despite their state-of-the-art performance, Despite achieving state-of-the-art performance, these agents face a significant challenge: \textbf{Insufficient high-quality issue descriptions.} Real-world datasets often exhibit misalignments between issue descriptions and their corresponding solutions, introducing noise and ambiguity that mislead automated agents and limit their problem-solving effectiveness. We propose \textbf{\textit{SWE-Fuse}}, an issue-description-aware training framework that fuses issue-description-guided and issue-free samples for training SWE agents. It consists of two key modules: (1) An issue-free-driven trajectory learning module for mitigating potentially misleading issue descriptions while enabling the model to learn step-by-step debugging processes; and (2) An entropy-aware RLVR training module, which adaptively adjusts training dynamics through entropy-driven clipping. It applies relaxed clipping under high entropy to encourage exploration, and stricter clipping under low entropy to ensure training stability. We evaluate SWE-Fuse on the widely studied SWE-bench Verified benchmark shows to demonstrate its effectiveness in solving real-world software problems. Specifically, SWE-Fuse outperforms the best 8B and 32B baselines by 43.0\% and 60.2\% in solve rate, respectively. Furthermore, integrating SWE-Fuse with test-time scaling (TTS) enables further performance improvements, achieving solve rates of 49.8\% and 65.2\% under TTS@8 for the 8B and 32B models, respectively.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!