2602.07787v1 Feb 08, 2026 cs.AI

멀티 에이전트는 전기 스크린의 꿈을 꾸는가? 작업 분해를 통해 AndroidWorld에서 완벽한 정확도 달성하기

Do Multi-Agents Dream of Electric Screens? Achieving Perfect Accuracy on AndroidWorld Through Task Decomposition

Jean-Pierre Lo

Citations: 134

h-index: 1

Clement Guiguet

Citations: 1

h-index: 1

Charles Simon-Meunier

Citations: 1

h-index: 1

Nicolas Dehandschoewercker

Citations: 1

h-index: 1

Allen G. Roush

Citations: 142

h-index: 2

Judah Goldfeder

Citations: 129

h-index: 7

Ravid Shwartz-Ziv

Citations: 5,257

h-index: 21

P. Favreau

Citations: 1

h-index: 1

본 논문에서는 AndroidWorld 벤치마크에서 100% 성공률을 달성한 멀티 에이전트 시스템 Minitap을 소개한다. 이는 116개의 모든 작업을 완벽하게 해결한 최초의 사례로, 인간의 수행 능력(80%)을 능가한다. 우선 우리는 단일 에이전트 아키텍처가 실패하는 원인으로, 혼재된 추론 과정에서 비롯된 컨텍스트 오염, 에이전트가 감지하지 못하는 텍스트 입력 오류, 탈출 불가능한 반복적 행동 루프를 분석했다. Minitap은 이러한 각 실패 요인을 해결하기 위해 6개의 특화된 에이전트를 통한 인지적 분리, 기기 상태와 대조하는 텍스트 입력의 결정론적 사후 검증, 그리고 순환을 감지하여 전략 변화를 촉발하는 메타 인지 추론 등의 목표 지향적 메커니즘을 도입했다. 소거 연구(ablation study) 결과, 단일 에이전트 베이스라인 대비 멀티 에이전트 분해가 +21점, 검증된 실행이 +7점, 메타 인지가 +9점의 성능 향상에 기여하는 것으로 나타났다. Minitap은 오픈 소스 소프트웨어로 공개된다. https://github.com/minitap-ai/mobile-use

Original Abstract

We present Minitap, a multi-agent system that achieves 100% success on the AndroidWorld benchmark, the first to fully solve all 116 tasks and surpassing human performance (80%). We first analyze why single-agent architectures fail: context pollution from mixed reasoning traces, silent text input failures undetected by the agent, and repetitive action loops without escape. Minitap addresses each failure through targeted mechanisms: cognitive separation across six specialized agents, deterministic post-validation of text input against device state, and meta-cognitive reasoning that detects cycles and triggers strategy changes. Ablations show multi-agent decomposition contributes +21 points over single-agent baselines; verified execution adds +7 points; meta-cognition adds +9 points. We release Minitap as open-source software. https://github.com/minitap-ai/mobile-use

1 Citations

0 Influential

68.981063196732 Altmetric

345.9 Score

Original PDF

2,199

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!