2604.15309v1 Apr 16, 2026 cs.CV

MM-WebAgent: 웹페이지 생성을 위한 계층적 다중 모드 웹 에이전트

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Yifan Yang

Citations: 107

h-index: 6

Chong Luo

Citations: 882

h-index: 10

Ning Liao

Citations: 13

h-index: 2

Lili Qiu

Citations: 2,386

h-index: 20

Yuqing Yang

Citations: 2,530

h-index: 20

Yan Li

Citations: 330

h-index: 9

Weiwei Guo

Citations: 40

h-index: 2

Qiuchao Dai

Citations: 0

h-index: 0

Ji Li

Citations: 3

h-index: 1

Lijuan Wang

Citations: 303

h-index: 4

Zezi Zeng

Citations: 3

h-index: 1

Mingxin Cheng

Citations: 11

h-index: 3

Zhendong Wang

Citations: 4,988

h-index: 20

Zhengyuan Yang

Citations: 107

h-index: 3

Xue Yang

Citations: 38

h-index: 2

인공지능 생성 콘텐츠(AIGC) 도구의 빠른 발전으로 인해 이미지, 비디오, 시각화 자료 등이 필요에 따라 웹페이지 디자인에 활용되면서, 현대적인 UI/UX를 위한 유연하고 널리 채택되는 패러다임이 제시되고 있습니다. 그러나 이러한 도구를 자동 웹페이지 생성에 직접 통합하는 것은 종종 요소들이 독립적으로 생성되어 스타일의 일관성 부족과 전반적인 응집력 저하를 초래합니다. 본 논문에서는 AIGC 기반 요소 생성을 계층적 계획 및 반복적인 자기 성찰을 통해 조정하는 계층적 에이전트 프레임워크인 MM-WebAgent를 제안합니다. MM-WebAgent는 전반적인 레이아웃, 로컬 다중 모드 콘텐츠, 그리고 이들의 통합을 동시에 최적화하여 일관성 있고 시각적으로 통일된 웹페이지를 생성합니다. 또한, 다중 모드 웹페이지 생성을 위한 벤치마크와 체계적인 평가를 위한 다단계 평가 프로토콜을 소개합니다. 실험 결과, MM-WebAgent는 코드 생성 및 에이전트 기반의 기존 방법보다 우수한 성능을 보이며, 특히 다중 모드 요소 생성 및 통합 측면에서 뛰어난 성능을 입증했습니다. 코드 및 데이터: https://aka.ms/mm-webagent.

Original Abstract

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!