2602.01848v2 Feb 02, 2026 cs.AI

ROMA: 장기 다중 에이전트 시스템을 위한 재귀적 개방형 메타 에이전트 프레임워크

ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

Salaheddin Alzu'bi

Citations: 1

h-index: 1

Baran Nama

Citations: 1

h-index: 1

Arda Kaz

Citations: 46

h-index: 1

A. Eswaran

Citations: 5

h-index: 1

Weiyuan Chen

Citations: 51

h-index: 2

Sarvesh Khetan

Citations: 5

h-index: 1

R. Bala

Citations: 108

h-index: 4

Tu Vu

Citations: 47

h-index: 2

Sewoong Oh

Citations: 427

h-index: 3

현재 에이전트 프레임워크는 장기적인 작업에서 성능이 저조합니다. 추론 깊이가 증가함에 따라 순차적 조율이 취약해지고, 컨텍스트 창은 성능을 저하시키는 엄격한 제한을 가하며, 불투명한 실행 추적은 오류를 찾거나 디버깅하기 어렵게 만듭니다. 우리는 이러한 한계를 재귀적 작업 분해와 구조화된 집계를 통해 해결하는 도메인에 구애받지 않는 프레임워크인 ROMA (Recursive Open Meta-Agents)를 소개합니다. ROMA는 목표를 의존성 정보를 고려한 하위 작업 트리로 분해하여 병렬로 실행할 수 있으며, 집계는 중간 결과를 압축하고 검증하여 컨텍스트 증가를 제어합니다. 우리의 프레임워크는 에이전트 구축을 4가지 모듈식 역할(Atomizer, Planner, Executor, Aggregator) 중심으로 표준화하여 조율과 모델 선택을 명확하게 분리하고 투명하고 계층적인 실행 추적을 가능하게 합니다. 이러한 설계는 비용, 지연 시간 및 기능을 기준으로 다양한 모델과 도구를 혼합하는 이기종 다중 에이전트 시스템을 지원합니다. ROMA를 특정 작업에 맞게 미세 조정 없이 적용할 수 있도록, 우리는 ROMA의 구성 요소 계층 내에서 프롬프트를 검색하면서 인터페이스 계약을 유지하는 개선된 유전적-파레토 프롬프트 제안 시스템인 GEPA$+를 추가로 소개합니다. 우리는 ROMA와 GEPA+의 결합이 추론 및 장문 생성 벤치마크에서 뛰어난 시스템 수준 성능을 제공한다는 것을 보여줍니다. 웹 증거의 충돌에 대한 추론을 평가하는 SEAL-0에서, GLM-4.6으로 구현된 ROMA는 Kimi-Researcher보다 정확도가 9.9% 향상되었습니다. 장문 작성 벤치마크인 EQ-Bench에서, ROMA는 DeepSeek-V3가 Claude Sonnet 4.5와 같은 선도적인 독점 모델과 동등한 성능을 발휘하도록 합니다. 우리의 결과는 재귀적이고 모듈식 에이전트 아키텍처가 추론 깊이를 확장하면서도 해석 가능하고 유연하며 모델에 독립적일 수 있음을 보여줍니다.

Original Abstract

Current agentic frameworks underperform on long-horizon tasks. As reasoning depth increases, sequential orchestration becomes brittle, context windows impose hard limits that degrade performance, and opaque execution traces make failures difficult to localize or debug. We introduce ROMA (Recursive Open Meta-Agents), a domain-agnostic framework that addresses these limitations through recursive task decomposition and structured aggregation. ROMA decomposes goals into dependency-aware subtask trees that can be executed in parallel, while aggregation compresses and validates intermediate results to control context growth. Our framework standardizes agent construction around four modular roles --Atomizer (which decides whether a task should be decomposed), Planner, Executor, and Aggregator -- which cleanly separate orchestration from model selection and enable transparent, hierarchical execution traces. This design supports heterogeneous multi-agent systems that mix models and tools according to cost, latency, and capability. To adapt ROMA to specific tasks without fine-tuning, we further introduce GEPA$+$, an improved Genetic-Pareto prompt proposer that searches over prompts within ROMA's component hierarchy while preserving interface contracts. We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks. On SEAL-0, which evaluates reasoning over conflicting web evidence, ROMA instantiated with GLM-4.6 improves accuracy by 9.9\% over Kimi-Researcher. On EQ-Bench, a long-form writing benchmark, ROMA enables DeepSeek-V3 to match the performance of leading closed-source models such as Claude Sonnet 4.5. Our results demonstrate that recursive, modular agent architectures can scale reasoning depth while remaining interpretable, flexible, and model-agnostic.

1 Citations

0 Influential

2 Altmetric

11.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!