2601.22149v1 Jan 29, 2026 cs.CL

DynaWeb: 모델 기반 강화 학습을 활용한 웹 에이전트

DynaWeb: Model-Based Reinforcement Learning of Web Agents

Peidong Liu

Citations: 17

h-index: 2

Lynn Ai

Citations: 15

h-index: 2

Eric Yang

Citations: 42

h-index: 4

Meng Cao

Citations: 79

h-index: 4

Han Ding

Citations: 952

h-index: 5

Junqiao Wang

Citations: 3

h-index: 1

Z. Ji

Citations: 2,924

h-index: 3

Rongzhao Zhang

Citations: 8

h-index: 2

Tianyu Shi

Citations: 2

h-index: 1

Lei Yu

Citations: 113

h-index: 4

대규모 언어 모델(LLM)과 강화 학습(RL)을 기반으로 하는 자율 웹 에이전트의 개발은 범용 인공지능 어시스턴트에 한 걸음 더 다가가는 중요한 발전입니다. 그러나 이러한 에이전트를 훈련하는 것은 실시간 인터넷과의 상호 작용이라는 어려움으로 인해 비효율적이고 비용이 많이 들며 위험이 따릅니다. 모델 기반 강화 학습(MBRL)은 환경의 세계 모델을 학습하여 시뮬레이션된 상호 작용을 가능하게 함으로써 유망한 해결책을 제공합니다. 본 논문에서는 DynaWeb이라는 새로운 MBRL 프레임워크를 소개합니다. DynaWeb은 웹 에이전트를 훈련하면서, 에이전트의 행동을 기반으로 자연스러운 웹 페이지 표현을 예측하도록 훈련된 웹 세계 모델과 상호 작용합니다. 이 모델은 에이전트 정책이 효율적인 온라인 강화 학습을 위해 방대한 수의 시뮬레이션된 행동 경로를 생성할 수 있도록 하는 가상 웹 환경 역할을 합니다. DynaWeb은 정책 롤아웃 외에도, 훈련 데이터에서 수집된 실제 전문가의 경로를 포함하며, 이는 훈련 과정에서 온-정책 롤아웃과 무작위로 혼합되어 안정성을 높이고 샘플 효율성을 향상시킵니다. 어려운 WebArena 및 WebVoyager 벤치마크에서 수행된 실험 결과, DynaWeb은 최첨단 오픈 소스 웹 에이전트 모델의 성능을 꾸준히 그리고 현저하게 향상시키는 것으로 나타났습니다. 우리의 연구 결과는 상상력을 통해 웹 에이전트를 훈련하는 것이 가능하며, 온라인 에이전트 기반 강화 학습을 확장하고 효율적으로 발전시킬 수 있는 방법을 제시합니다.

Original Abstract

The development of autonomous web agents, powered by Large Language Models (LLMs) and reinforcement learning (RL), represents a significant step towards general-purpose AI assistants. However, training these agents is severely hampered by the challenges of interacting with the live internet, which is inefficient, costly, and fraught with risks. Model-based reinforcement learning (MBRL) offers a promising solution by learning a world model of the environment to enable simulated interaction. This paper introduces DynaWeb, a novel MBRL framework that trains web agents through interacting with a web world model trained to predict naturalistic web page representations given agent actions. This model serves as a synthetic web environment where an agent policy can dream by generating vast quantities of rollout action trajectories for efficient online reinforcement learning. Beyond free policy rollouts, DynaWeb incorporates real expert trajectories from training data, which are randomly interleaved with on-policy rollouts during training to improve stability and sample efficiency. Experiments conducted on the challenging WebArena and WebVoyager benchmarks demonstrate that DynaWeb consistently and significantly improves the performance of state-of-the-art open-source web agent models. Our findings establish the viability of training web agents through imagination, offering a scalable and efficient way to scale up online agentic RL.

3 Citations

1 Influential

2.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!