2601.13060v1 Jan 19, 2026 cs.AI

MagicGUI-RMS: 자동화된 피드백 환류를 통한 자가 진화형 GUI 에이전트를 위한 다중 에이전트 보상 모델 시스템

MagicGUI-RMS: A Multi-Agent Reward Model System for Self-Evolving GUI Agents via Automated Feedback Reflux

Zhihui Cao

Citations: 28

h-index: 2

Yudong Zhang

Citations: 1

h-index: 1

Keying Qi

Citations: 1

h-index: 1

Hengxin Wu

Citations: 1

h-index: 1

Yuran Wang

Citations: 27

h-index: 1

Guitao Fan

Citations: 1

h-index: 1

Zhilin Gao

Citations: 13

h-index: 1

He Yang

Citations: 13

h-index: 1

Minqi Xiang

Citations: 13

h-index: 1

Zuojiang Wang

Citations: 4

h-index: 1

Wenke Huang

Citations: 29

h-index: 2

Rui Wang

Citations: 14

h-index: 2

Zeyu Zheng

Citations: 495

h-index: 7

Hao Zhu

Citations: 127

h-index: 5

Guokun Wu

Citations: 19

h-index: 2

Yicong Liu

Citations: 1

h-index: 1

Haikun Xu

Citations: 110

h-index: 4

Zecheng Li

Citations: 121

h-index: 4

Jian Zhao

Citations: 14

h-index: 3

Xingyu Liu

Citations: 28

h-index: 3

그래픽 사용자 인터페이스(GUI) 에이전트는 다양한 애플리케이션에서 자율 상호작용과 신뢰할 수 있는 작업 수행을 향해 빠르게 발전하고 있습니다. 그러나 에이전트 궤적 평가의 자동화와 지속적인 개선을 위한 대규모 고품질 학습 데이터 생성이라는 두 가지 핵심 과제는 여전히 해결되지 않고 있습니다. 기존 접근 방식은 종종 수동 주석이나 정적인 규칙 기반 검증에 의존하여, 이는 확장성을 제한하고 동적 환경에서의 적응성을 저하시킵니다. 이에 본 논문에서는 적응형 궤적 평가, 교정 피드백, 자가 진화 학습 기능을 제공하는 다중 에이전트 보상 모델 시스템인 MagicGUI-RMS를 제안합니다. MagicGUI-RMS는 도메인 특화 보상 모델(DS-RM)과 범용 보상 모델(GP-RM)을 통합하여, 이질적인 GUI 작업 전반에 걸쳐 세밀한 행동 평가와 강력한 일반화를 가능하게 합니다. 대규모 보상 학습을 지원하기 위해, 균형 잡히고 다양한 보상 데이터셋을 자동으로 생성하는 구조화된 데이터 구축 파이프라인을 설계하여 샘플의 충실도를 유지하면서 주석 비용을 효과적으로 절감했습니다. 실행 중 보상 모델 시스템은 오류 행동을 식별하고, 개선된 대안을 제안하며, 자동화된 데이터 환류 메커니즘을 통해 에이전트의 행동을 지속적으로 향상시킵니다. 광범위한 실험을 통해 MagicGUI-RMS가 작업 정확도와 행동 견고성 면에서 상당한 성과를 거두었음을 입증했습니다. 이러한 결과는 MagicGUI-RMS가 보상 기반 적응을 통해 구동되는 자가 개선형 GUI 에이전트를 구축하기 위한 원칙적이고 효과적인 기반임을 보여줍니다.

Original Abstract

Graphical user interface (GUI) agents are rapidly progressing toward autonomous interaction and reliable task execution across diverse applications. However, two central challenges remain unresolved: automating the evaluation of agent trajectories and generating high-quality training data at scale to enable continual improvement. Existing approaches often depend on manual annotation or static rule-based verification, which restricts scalability and limits adaptability in dynamic environments. We present MagicGUI-RMS, a multi-agent reward model system that delivers adaptive trajectory evaluation, corrective feedback, and self-evolving learning capabilities. MagicGUI-RMS integrates a Domain-Specific Reward Model (DS-RM) with a General-Purpose Reward Model (GP-RM), enabling fine-grained action assessment and robust generalization across heterogeneous GUI tasks. To support reward learning at scale, we design a structured data construction pipeline that automatically produces balanced and diverse reward datasets, effectively reducing annotation costs while maintaining sample fidelity. During execution, the reward model system identifies erroneous actions, proposes refined alternatives, and continuously enhances agent behavior through an automated data-reflux mechanism. Extensive experiments demonstrate that MagicGUI-RMS yields substantial gains in task accuracy, behavioral robustness. These results establish MagicGUI-RMS as a principled and effective foundation for building self-improving GUI agents driven by reward-based adaptation.

1 Citations

0 Influential

3.5 Altmetric

18.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!