2606.09826v1 Jun 08, 2026 cs.CV

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Yiyu Wang

Citations: 132

h-index: 6

Wei Huang

Citations: 228

h-index: 5

Xiaojuan Qi

Citations: 638

h-index: 8

Mingxian Lin

Citations: 24

h-index: 3

Shengju Qian

Citations: 171

h-index: 5

Yuqi Liu

Citations: 31

h-index: 3

Yi-Hua Huang

Citations: 657

h-index: 6

Yitang Li

Citations: 24

h-index: 3

Fan Zhang

Citations: 93

h-index: 4

Zeyu Hu

Citations: 62

h-index: 4

Lingting Zhu

Citations: 352

h-index: 4

Xin Wang

Citations: 160

h-index: 4

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We address these gaps with OmniGameArena, a real-time benchmark of twelve newly built Unreal Engine 5 games spanning Solo (7), PvP (3), and Coop (2) with unified action interfaces, and the Improvement Dynamics Curve (IDC), an agentic-reflection harness in which a tool-using reflector LLM autonomously refines a bounded skill prompt across multiple rounds. Beyond cold-start leaderboard scores, IDC exposes two additional observables for each (agent, game) pair: how the score evolves across reflection rounds, and how the learned skill behaves on held-out task variants. We report these observables for twelve VLM agents on the cold-start leaderboard and four top agents under IDC.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!