2602.11210v1 Feb 11, 2026 cs.SE

SWE-MiniSandbox: 컨테이너 기반이 아닌 강화 학습을 활용한 소프트웨어 엔지니어링 에이전트 구축

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

Danlong Yuan

Citations: 32

h-index: 2

Wei Wu

Citations: 107

h-index: 4

Zheng Wang

Citations: 8

h-index: 2

Xueliang Zhao

Citations: 709

h-index: 11

Huishuai Zhang

Citations: 341

h-index: 10

Dongyan Zhao

Citations: 26

h-index: 2

강화 학습(RL)은 소프트웨어 엔지니어링(SWE) 에이전트 훈련을 위한 핵심 패러다임으로 자리 잡았지만, 기존 파이프라인은 일반적으로 각 작업에 대한 격리를 위해 컨테이너를 사용합니다. 대규모 환경에서는 미리 빌드된 컨테이너 이미지가 상당한 저장 공간 오버헤드를 발생시키고, 환경 설정 속도를 늦추며, 컨테이너 관리 권한을 필요로 합니다. 본 논문에서는 격리를 희생하지 않고도 SWE 에이전트의 확장 가능한 RL 훈련을 가능하게 하는 가볍고 컨테이너 기반이 아닌 방법인 SWE-MiniSandbox를 제안합니다. SWE-MiniSandbox는 각 인스턴스에 대한 컨테이너에 의존하는 대신, 커널 수준 메커니즘으로 백업된 격리된 작업 공간에서 각 작업을 실행하여 시스템 오버헤드를 크게 줄입니다. 또한, 본 논문에서는 가벼운 환경 프리캐싱 기술을 활용하여 대용량 컨테이너 이미지의 필요성을 없앱니다. 그 결과, SWE-MiniSandbox는 컨테이너 기반 파이프라인에 필요한 디스크 사용량을 약 5%로 줄이고, 환경 준비 시간을 컨테이너 기준의 약 25%로 단축합니다. 실험 결과는 SWE-MiniSandbox가 표준 컨테이너 기반 파이프라인과 비교 가능한 평가 성능을 달성함을 보여줍니다. SWE-MiniSandbox는 무거운 컨테이너 인프라에 대한 의존성을 제거함으로써, 특히 자원 제약적인 연구 환경에서 RL 기반 SWE 에이전트를 확장하기 위한 실용적이고 접근 가능한 기반을 제공합니다.

Original Abstract

Reinforcement learning (RL) has become a key paradigm for training software engineering (SWE) agents, but existing pipelines typically rely on per-task containers for isolation. At scale, pre-built container images incur substantial storage overhead, slow environment setup, and require container-management privileges. We propose SWE-MiniSandbox, a lightweight, container-free method that enables scalable RL training of SWE agents without sacrificing isolation. Instead of relying on per-instance containers, SWE-MiniSandbox executes each task in an isolated workspace backed by kernel-level mechanisms, substantially reducing system overhead. It leverages lightweight environment pre-caching techniques to eliminate the need for bulky container images. As a result, our approach lowers disk usage to approximately 5\% of that required by container-based pipelines and reduces environment preparation time to about 25\% of the container baseline. Empirical results demonstrate that SWE-MiniSandbox achieves evaluation performance comparable to standard container-based pipelines. By removing the dependency on heavy container infrastructure, SWE-MiniSandbox offers a practical and accessible foundation for scaling RL-based SWE agents, particularly in resource-constrained research environments.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!