2602.15384v1 Feb 17, 2026 cs.AI

행동 교정 기능을 갖춘 세계 모델 증강 웹 에이전트

World-Model-Augmented Web Agents with Action Correction

Xiyun Li

Citations: 34

h-index: 1

Juncheng Li

Citations: 30

h-index: 2

Zhouzhou Shen

Citations: 13

h-index: 2

Xueyu Hu

Citations: 353

h-index: 8

Tianqing Fang

Tencent AI Lab

Citations: 1,242

h-index: 21

Shengyu Zhang

Citations: 99

h-index: 6

대규모 언어 모델 기반의 웹 에이전트는 웹 작업을 자동화하는 데 있어 유망한 능력을 입증했습니다. 그러나 현재의 웹 에이전트들은 환경 변화 예측의 한계로 인해 합리적인 행동을 추론하는 데 어려움을 겪으며, 실행 위험을 포괄적으로 인지하지 못해 손실을 유발하고 작업 실패로 이어지는 위험한 행동을 성급하게 수행하기도 합니다. 이러한 문제를 해결하기 위해, 우리는 모델 협업, 결과 시뮬레이션, 피드백 기반 행동 개선을 통합한 웹 에이전트인 WAC를 제안합니다. 개별 모델의 인지적 고립을 극복하기 위해, 우리는 행동 모델이 전략적 지침을 얻기 위해 웹 환경 전문가 역할을 하는 세계 모델에 자문을 구할 수 있는 다중 에이전트 협업 프로세스를 도입합니다. 그 후 행동 모델은 환경 상태 전이 역학에 대한 사전 지식을 활용하여 후보 행동 제안을 강화함으로써 이러한 제안을 실행 가능한 행동으로 구체화합니다. 위험을 인지하고 복원력 있는 작업 실행을 달성하기 위해, 우리는 2단계 추론 체인을 도입합니다. 환경 상태 전이에 특화된 세계 모델이 행동 결과를 시뮬레이션하면, 심판 모델(judge model)이 이를 면밀히 조사하여 필요할 때 행동 교정 피드백을 제공합니다. 실험 결과, WAC는 VisualWebArena에서 1.8%, Online-Mind2Web에서 1.3%의 절대적인 성능 향상을 달성했습니다.

Original Abstract

Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might not possess comprehensive awareness of execution risks, prematurely performing risky actions that cause losses and lead to task failure. To address these challenges, we propose WAC, a web agent that integrates model collaboration, consequence simulation, and feedback-driven action refinement. To overcome the cognitive isolation of individual models, we introduce a multi-agent collaboration process that enables an action model to consult a world model as a web-environment expert for strategic guidance; the action model then grounds these suggestions into executable actions, leveraging prior knowledge of environmental state transition dynamics to enhance candidate action proposal. To achieve risk-aware resilient task execution, we introduce a two-stage deduction chain. A world model, specialized in environmental state transitions, simulates action outcomes, which a judge model then scrutinizes to trigger action corrective feedback when necessary. Experiments show that WAC achieves absolute gains of 1.8% on VisualWebArena and 1.3% on Online-Mind2Web.

1 Citations

1 Influential

10.5 Altmetric

55.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!