2603.13023v2 Mar 13, 2026 cs.SE

daVinci-Env: 대규모 소프트웨어 엔지니어링 환경 자동 생성

daVinci-Env: Open SWE Environment Synthesis at Scale

Mohan Jiang

Citations: 34

h-index: 4

Yunze Wu

Citations: 83

h-index: 5

Ji Zeng

Citations: 35

h-index: 4

Dayuan Fu

Citations: 261

h-index: 4

Yaxing Huang

Citations: 11

h-index: 2

Shenyu Wu

Citations: 16

h-index: 2

Zerui Peng

Citations: 4

h-index: 1

Jie Sun

Citations: 10

h-index: 3

Lin Zhang

Citations: 60

h-index: 3

Yukun Li

Citations: 25

h-index: 3

Jiarui Hu

Citations: 6

h-index: 1

Liming Liu

Citations: 57

h-index: 4

Jinlong Hou

Citations: 18

h-index: 3

Pengfei Liu

Citations: 11

h-index: 3

능력 있는 소프트웨어 엔지니어링(SWE) 에이전트를 훈련시키려면, 반복적인 코드 수정, 테스트 실행 및 솔루션 개선을 위한 동적 피드백 루프를 제공하는 대규모의 실행 가능하고 검증 가능한 환경이 필요합니다. 그러나 기존의 오픈 소스 데이터 세트는 규모와 저장소 다양성 측면에서 제한적이며, 산업 솔루션은 공개되지 않은 인프라를 가지고 있어 대부분의 학술 연구 그룹에게는 접근 장벽이 됩니다. 본 논문에서는 Python 기반의 SWE 에이전트 훈련을 위한 가장 큰 규모의 완전한 투명성 프레임워크인 OpenSWE를 소개합니다. OpenSWE는 45,320개의 실행 가능한 Docker 환경을 포함하며, 12,800개 이상의 저장소를 포괄하고 있으며, 모든 Dockerfile, 평가 스크립트 및 인프라가 완전하게 공개되어 재현성을 보장합니다. OpenSWE는 64노드의 분산 클러스터에 배포된 다중 에이전트 합성 파이프라인을 통해 구축되었으며, 저장소 탐색, Dockerfile 생성, 평가 스크립트 생성 및 반복적인 테스트 분석을 자동화합니다. 규모뿐만 아니라, 본 논문에서는 각 환경의 고유한 난이도를 특성화하는 품질 중심 필터링 파이프라인을 제안합니다. 이 파이프라인은 풀기 어렵거나 충분히 도전적이지 않은 환경을 제거하고, 학습 효율성을 극대화하는 환경만 유지합니다. 환경 구축에 891,000달러, 그리고 트래jectory 샘플링 및 난이도 기반 큐레이션에 추가로 576,000달러가 투자되어, 총 약 147만 달러의 투자가 이루어졌습니다. 그 결과, 약 9,000개의 품질 보장된 환경에서 약 13,000개의 큐레이션된 트래jectory를 얻을 수 있었습니다. 광범위한 실험을 통해 OpenSWE의 효과가 검증되었습니다. OpenSWE-32B와 OpenSWE-72B는 SWE-bench Verified에서 각각 62.4%와 66.0%의 정확도를 달성하여 Qwen2.5 시리즈 중에서 최첨단(SOTA) 성능을 보였습니다. 또한, SWE에 특화된 훈련은 수학적 추론에서 최대 12점, 과학 벤치마크에서 5점의 상당한 성능 향상을 가져왔으며, 사실 정보의 기억력 저하 없이 이러한 개선이 가능했습니다.

Original Abstract

Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for iterative code editing, test execution, and solution refinement. However, existing open-source datasets remain limited in scale and repository diversity, while industrial solutions are opaque with unreleased infrastructure, creating a prohibitive barrier for most academic research groups. We present OpenSWE, the largest fully transparent framework for SWE agent training in Python, comprising 45,320 executable Docker environments spanning over 12.8k repositories, with all Dockerfiles, evaluation scripts, and infrastructure fully open-sourced for reproducibility. OpenSWE is built through a multi-agent synthesis pipeline deployed across a 64-node distributed cluster, automating repository exploration, Dockerfile construction, evaluation script generation, and iterative test analysis. Beyond scale, we propose a quality-centric filtering pipeline that characterizes the inherent difficulty of each environment, filtering out instances that are either unsolvable or insufficiently challenging and retaining only those that maximize learning efficiency. With $891K spent on environment construction and an additional $576K on trajectory sampling and difficulty-aware curation, the entire project represents a total investment of approximately $1.47 million, yielding about 13,000 curated trajectories from roughly 9,000 quality guaranteed environments. Extensive experiments validate OpenSWE's effectiveness: OpenSWE-32B and OpenSWE-72B achieve 62.4% and 66.0% on SWE-bench Verified, establishing SOTA among Qwen2.5 series. Moreover, SWE-focused training yields substantial out-of-domain improvements, including up to 12 points on mathematical reasoning and 5 points on science benchmarks, without degrading factual recall.

4 Citations

0 Influential

2.5 Altmetric

16.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!