2605.15040v1 May 14, 2026 cs.AI

Orchard: 오픈 소스 기반의 자율 에이전트 모델링 프레임워크

Orchard: An Open-Source Agentic Modeling Framework

Xingdi Yuan

Citations: 164

h-index: 8

Baolin Peng

Citations: 54

h-index: 4

Tong Zhang

Citations: 41

h-index: 4

Qianhui Wu

Microsoft Research

Citations: 2,686

h-index: 19

Wenlin Yao

Citations: 163

h-index: 4

Jianfeng Gao

Citations: 730

h-index: 10

Tao Ge

Citations: 367

h-index: 7

Hao Cheng

Citations: 14

h-index: 2

Xiao Yu

Citations: 151

h-index: 7

Ruiyi Yang

Citations: 32

h-index: 3

Alessandrio Sordoni

Citations: 1

h-index: 1

Yelong Shen

Citations: 614

h-index: 6

Pengcheng He

Citations: 296

h-index: 3

Zhou Yu

Citations: 144

h-index: 6

자율 에이전트 모델링은 LLM(대규모 언어 모델)을 계획, 추론, 도구 사용 및 환경과의 다중 턴 상호 작용을 통해 복잡한 작업을 해결할 수 있는 자율 에이전트로 변환하는 것을 목표로 합니다. 막대한 투자가 이루어지고 있음에도 불구하고, 인프라 및 교육 격차로 인해 개방적인 연구는 여전히 제약받고 있습니다. 많은 고성능 시스템은 독점적인 코드, 모델 또는 서비스를 사용하고 있는 반면, 대부분의 오픈 소스 프레임워크는 확장 가능한 에이전트 훈련보다는 오케스트레이션 및 평가에 중점을 둡니다. 본 논문에서는 확장 가능한 에이전트 모델링을 위한 오픈 소스 프레임워크인 Orchard를 소개합니다. Orchard의 핵심은 Orchard Env로, 다양한 작업 도메인, 에이전트 구조 및 파이프라인 단계를 포괄하는 샌드박스 라이프사이클 관리를 위한 재사용 가능한 기본 기능을 제공하는 경량 환경 서비스입니다. Orchard Env를 기반으로, 세 가지 에이전트 모델링 방법을 개발했습니다. Orchard-SWE는 코딩 에이전트를 대상으로 하며, MiniMax-M2.5 및 Qwen3.5-397B에서 추출한 107K개의 트레이징 데이터를 활용하고, 해결되지 않은 트레이징 데이터의 생산적인 부분에서 학습할 수 있도록 크레딧 할당 SFT(Supervised Fine-Tuning)를 도입하고, RL(Reinforcement Learning)에 Balanced Adaptive Rollout을 적용했습니다. Qwen3-30B-A3B-Thinking을 기반으로 Orchard-SWE는 SFT 후 64.3%, SFT+RL 후 67.5%의 SWE-bench Verified 성능을 달성하여, 유사한 크기의 오픈 소스 모델 중에서 새로운 최고 성능을 기록했습니다. Orchard-GUI는 0.4K개의 증류된 트레이징 데이터와 2.2K개의 개방형 작업만을 사용하여 4B 파라미터의 비전-언어 기반 컴퓨터 사용 에이전트를 훈련합니다. 이는 WebVoyager, Online-Mind2Web 및 DeepShop에서 각각 74.1%, 67.0% 및 64.0%의 성공률을 달성하여, 가장 강력한 오픈 소스 모델이면서 동시에 독점 시스템과 경쟁할 수 있는 성능을 보여줍니다. Orchard-Claw는 개인 비서 에이전트를 대상으로 하며, 0.2K개의 합성 작업만을 사용하여 훈련되었으며, Claw-Eval에서 59.6%의 pass@3 성능을, 더 강력한 ZeroClaw 구조와 결합하여 73.9%의 성능을 달성했습니다. 종합적으로, 이러한 결과는 경량화되고 개방적이며, 구조에 독립적인 환경 계층이 다양한 도메인에서 재사용 가능한 에이전트 데이터, 훈련 방법 및 평가를 가능하게 한다는 것을 보여줍니다.

Original Abstract

Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.

1 Citations

0 Influential

9.5 Altmetric

48.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!