2601.18418v2 Jan 26, 2026 cs.SE

daVinci-Dev: 소프트웨어 엔지니어링을 위한 에이전트 기반 중간 학습

daVinci-Dev: Agent-native Mid-training for Software Engineering

Yang Xiao

Citations: 3,308

h-index: 7

Mohan Jiang

Citations: 21

h-index: 3

Jie Sun

Citations: 25

h-index: 4

Yunze Wu

Citations: 68

h-index: 5

Lyumanshan Ye

Citations: 308

h-index: 7

Tiantian Mi

Citations: 11

h-index: 2

Ji Zeng

Citations: 25

h-index: 3

Pengfei Liu

Citations: 289

h-index: 6

Zhen Huang

Citations: 75

h-index: 4

Dayuan Fu

Citations: 225

h-index: 4

Xuefeng Li

Citations: 784

h-index: 12

Yumin Zhuang

Citations: 21

h-index: 3

Yaxing Huang

Citations: 6

h-index: 1

Muhang Xie

Citations: 5

h-index: 1

Qishuo Hua

Citations: 32

h-index: 3

Han Wang

Citations: 262

h-index: 5

Jifan Lin

Citations: 61

h-index: 5

최근 대규모 언어 모델(LLM)의 발전은 단일 턴 코드 생성에서 에이전트 기반 소프트웨어 엔지니어링으로 전환되었습니다. 이는 모델이 자율적으로 복잡한 저장소를 탐색, 수정 및 테스트하는 패러다임입니다. 코드 에이전트에 대한 후속 학습 방법이 사실상 표준이 되었지만, 진정한 에이전트 워크플로우를 반영하는 대규모 데이터에 대한 **에이전트 기반 중간 학습(MT)**은 상당한 리소스 요구 사항으로 인해 아직 충분히 연구되지 않았습니다. 이는 강화 학습에만 의존하는 것보다 더 확장 가능한 방식으로 기본적인 에이전트 행동을 습득할 수 있는 잠재력을 가지고 있기 때문입니다. 효과적인 에이전트 기반 중간 학습을 구현하는 데 있어 중요한 과제는 정적 학습 데이터와 실제 개발 환경의 역동적이고 피드백이 풍부한 환경 간의 분포 불일치입니다. 이를 해결하기 위해, 우리는 에이전트 기반 중간 학습에 대한 체계적인 연구를 제시하며, 효과적인 에이전트 개발을 위한 데이터 합성 원칙과 학습 방법을 확립했습니다. 우리의 접근 방식의 핵심은 **에이전트 친화적인 데이터**입니다. 이는 두 가지 상호 보완적인 유형의 트래jectory로 구성된 지도 데이터입니다. **컨텍스트 기반 트래jectory**는 에이전트가 경험하는 전체 정보 흐름을 보존하여 광범위한 적용 범위와 다양성을 제공하고, **환경 기반 트래jectory**는 실제 도구 호출 및 테스트 실행에서 파생된 관찰을 제공하여 깊이와 상호 작용의 진정성을 제공합니다. 우리는 모델의 에이전트 기능을 `SWE-Bench Verified`에서 검증했습니다. 우리는 동일한 기본 모델과 에이전트 프레임워크를 사용한 두 가지 후속 학습 설정에서 이전의 오픈 소스 소프트웨어 엔지니어링 중간 학습 레시피인 `Kimi-Dev`보다 우수한 성능을 보여주었으며, 중간 학습에 사용된 토큰 수는 절반 미만(73.1B)입니다. 상대적인 장점 외에도, 가장 성능이 뛰어난 32B 및 72B 모델은 각각 **56.1%** 및 **58.5%**의 해결률을 달성했습니다.

Original Abstract

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While post-training methods have become the de facto approach for code agents, **agentic mid-training**-mid-training (MT) on large-scale data that mirrors authentic agentic workflows-remains critically underexplored due to substantial resource requirements, despite offering a more scalable path to instilling foundational agentic behaviors than relying solely on expensive reinforcement learning. A central challenge in realizing effective agentic mid-training is the distribution mismatch between static training data and the dynamic, feedback-rich environment of real development. To address this, we present a systematic study of agentic mid-training, establishing both the data synthesis principles and training methodology for effective agent development at scale. Central to our approach is **agent-native data**-supervision comprising two complementary types of trajectories: **contextually-native trajectories** that preserve the complete information flow an agent experiences, offering broad coverage and diversity; and **environmentally-native trajectories** collected from executable repositories where observations stem from actual tool invocations and test executions, providing depth and interaction authenticity. We verify the model's agentic capabilities on `SWE-Bench Verified`. We demonstrate our superiority over the previous open software engineering mid-training recipe `Kimi-Dev` under two post-training settings with an aligned base model and agentic scaffold, while using less than half mid-training tokens (73.1B). Besides relative advantage, our best performing 32B and 72B models achieve **56.1%** and **58.5%** resolution rates, respectively, which are ...

5 Citations

0 Influential

6 Altmetric

35.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!