2603.21489v1 Mar 23, 2026 cs.CL

비동기 소프트웨어 엔지니어링 에이전트를 위한 효과적인 전략

Effective Strategies for Asynchronous Software Engineering Agents

Citations: 966

h-index: 16

Citations: 294

h-index: 5

AI 에이전트는 GitHub의 문제 해결과 같은 개별적인 소프트웨어 엔지니어링(SWE) 작업에서 점점 더 높은 성능을 보이고 있습니다. 그러나 여러 상호 의존적인 하위 작업이 포함된 장기적인 작업은 정확성과 적시 완료 측면에서 여전히 어려움을 야기합니다. 이러한 장기적인 작업을 적시에 해결하는 자연스러운 방법은 비동기 다중 에이전트 협업입니다. 여기서 여러 에이전트가 동시에 작업의 다른 부분에 대해 작업합니다. 그러나 다중 에이전트 시스템의 효과적인 적용은 놀라울 정도로 어렵습니다. 여러 에이전트의 동시 수정은 서로 간섭하고, 의존성을 동기화하기 어렵고, 부분적인 진행 상황을 일관된 전체로 결합하는 것이 어렵습니다. 반면에, 인간 개발자는 오랫동안 대규모 소프트웨어 프로젝트에서 이러한 과제를 관리하기 위해 성숙한 협업 인프라에 의존해 왔습니다. 이러한 협업 원리를 바탕으로, 우리는 중앙 집중식 비동기 분리 위임(CAID)이라는 구조화된 다중 에이전트 조정 패러다임을 제안합니다. CAID는 세 가지 핵심 SWE 원칙, 즉 중앙 집중식 작업 위임, 비동기 실행 및 분리된 작업 공간을 기반으로 합니다. CAID는 중앙 관리자를 통해 의존성을 고려한 작업 계획을 구성하고, 분리된 작업 공간에서 하위 작업을 동시에 실행하며, 구조화된 통합을 통해 실행 가능한 테스트 기반 검증을 통해 진행 상황을 통합합니다. 실험적 평가 결과, CAID는 단일 에이전트 기준 모델보다 논문 재현 작업(PaperBench)에서 26.7% 절댓값으로, Python 라이브러리 개발 작업(Commit0)에서 14.3%의 정확도를 향상시키는 것으로 나타났습니다. 체계적인 분석을 통해, 브랜치 및 병합이 다중 에이전트 협업을 위한 핵심 조정 메커니즘이며, git worktree, git commit 및 git merge와 같은 SWE 원칙이 이를 안정적이고 실행 가능한 방식으로 구현할 수 있도록 한다는 것을 확인했습니다.

Original Abstract

AI agents have become increasingly capable at isolated software engineering (SWE) tasks such as resolving issues on Github. Yet long-horizon tasks involving multiple interdependent subtasks still pose challenges both with respect to accuracy, and with respect to timely completion. A natural approach to solving these long-horizon tasks in a timely manner is asynchronous multi-agent collaboration, where multiple agents work on different parts of the task at the same time. But effective application of multi-agent systems has proven surprisingly difficult: concurrent edits by multiple agents interfere with each other, dependencies are difficult to synchronize, and combining partial progress into a coherent whole is challenging. On the other hand, human developers have long relied on mature collaboration infrastructure to manage these challenges in large software projects. Inspired by these collaboration primitives, we introduce Centralized Asynchronous Isolated Delegation (CAID), a structured multi-agent coordination paradigm grounded in three core SWE primitives: centralized task delegation, asynchronous execution, and isolated workspaces. CAID constructs dependency-aware task plans through a central manager, executes subtasks concurrently in isolated workspaces, and consolidates progress via structured integration with executable test-based verification. In empirical evaluation, we find that CAID improves accuracy over single-agent baselines by 26.7% absolute on paper reproduction tasks (PaperBench) and 14.3% on Python library development tasks (Commit0). Through systematic analysis, we find that branch-and-merge is a central coordination mechanism for multi-agent collaboration, and that SWE primitives such as git worktree, git commit, and git merge enable it to be realized in a reliable and executable manner.

5 Citations

1 Influential

8 Altmetric

47.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!