2602.07457v1 Feb 07, 2026 cs.SE

리포지토리 수준의 코드 편집을 위한 학습 신호로서의 풀 리퀘스트

Pull Requests as a Training Signal for Repo-Level Code Editing

Qinglin Zhu

Citations: 31

h-index: 3

Tianyu Chen

Citations: 4

h-index: 1

Shuai Lu

Citations: 90

h-index: 3

Lei Ji

Citations: 3

h-index: 1

Runcong Zhao

Citations: 207

h-index: 9

Xiangxiang Dai

Citations: 69

h-index: 3

Yulan He

Citations: 334

h-index: 12

Lin Gui

Citations: 361

h-index: 13

Peng Cheng

Citations: 52

h-index: 3

Yeyun Gong

Citations: 187

h-index: 6

Murong Ma

Citations: 15

h-index: 2

리포지토리 수준의 코드 편집은 모델이 복잡한 의존성을 이해하고 방대한 코드베이스 전체에 걸쳐 정밀한 다중 파일 수정을 수행해야 하는 작업을 포함합니다. 최근 SWE-bench에서의 발전은 복잡한 에이전트 구조에 크게 의존하지만, 이러한 능력이 얼마나 효과적으로 고품질의 학습 신호를 통해 모델 내부에 통합될 수 있는지에 대한 명확한 이해가 부족합니다. 이러한 문제를 해결하기 위해, 본 연구에서는 Clean Pull Request (Clean-PR)라는 중간 학습 패러다임을 제안합니다. Clean-PR은 실제 GitHub 풀 리퀘스트를 리포지토리 수준의 편집을 위한 학습 신호로 활용합니다. 우리는 노이즈가 많은 풀 리퀘스트 diff를 검색/변경 편집 블록으로 변환하는 확장 가능한 파이프라인을 개발합니다. 이 파이프라인은 재구성 및 검증을 통해 작동하며, 12개 프로그래밍 언어를 포괄하는 2백만 건의 풀 리퀘스트로 구성된 가장 큰 공개 데이터셋을 생성합니다. 이 학습 신호를 사용하여 중간 학습 단계를 수행한 후, 에이전트 없이 정밀하게 조정된 지도 학습 과정을 거치며, 오류 기반 데이터 증강을 적용합니다. SWE-bench에서, 제안하는 모델은 기존의 instruction-tuned 모델보다 훨씬 뛰어난 성능을 보입니다. 구체적으로, SWE-bench Lite에서 13.6%, SWE-bench Verified에서 12.3%의 절대적인 성능 향상을 달성했습니다. 이러한 결과는 리포지토리 수준의 코드 이해 및 편집 능력이 복잡한 에이전트 구조에 의존하지 않고, 단순화된 에이전트 없는 프로토콜을 통해 모델 가중치 내부에 효과적으로 통합될 수 있음을 보여줍니다.

Original Abstract

Repository-level code editing requires models to understand complex dependencies and execute precise multi-file modifications across a large codebase. While recent gains on SWE-bench rely heavily on complex agent scaffolding, it remains unclear how much of this capability can be internalised via high-quality training signals. To address this, we propose Clean Pull Request (Clean-PR), a mid-training paradigm that leverages real-world GitHub pull requests as a training signal for repository-level editing. We introduce a scalable pipeline that converts noisy pull request diffs into Search/Replace edit blocks through reconstruction and validation, resulting in the largest publicly available corpus of 2 million pull requests spanning 12 programming languages. Using this training signal, we perform a mid-training stage followed by an agentless-aligned supervised fine-tuning process with error-driven data augmentation. On SWE-bench, our model significantly outperforms the instruction-tuned baseline, achieving absolute improvements of 13.6% on SWE-bench Lite and 12.3% on SWE-bench Verified. These results demonstrate that repository-level code understanding and editing capabilities can be effectively internalised into model weights under a simplified, agentless protocol, without relying on heavy inference-time scaffolding.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!