2606.09826v1 Jun 08, 2026 cs.CV

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Yiyu Wang
Yiyu Wang
Citations: 132
h-index: 6
Wei Huang
Wei Huang
Citations: 228
h-index: 5
Xiaojuan Qi
Xiaojuan Qi
Citations: 638
h-index: 8
Mingxian Lin
Mingxian Lin
Citations: 24
h-index: 3
Shengju Qian
Shengju Qian
Citations: 171
h-index: 5
Yuqi Liu
Yuqi Liu
Citations: 31
h-index: 3
Yi-Hua Huang
Yi-Hua Huang
Citations: 657
h-index: 6
Yitang Li
Yitang Li
Citations: 24
h-index: 3
Fan Zhang
Fan Zhang
Citations: 93
h-index: 4
Zeyu Hu
Zeyu Hu
Citations: 62
h-index: 4
Lingting Zhu
Lingting Zhu
Citations: 352
h-index: 4
Xin Wang
Xin Wang
Citations: 160
h-index: 4

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We address these gaps with OmniGameArena, a real-time benchmark of twelve newly built Unreal Engine 5 games spanning Solo (7), PvP (3), and Coop (2) with unified action interfaces, and the Improvement Dynamics Curve (IDC), an agentic-reflection harness in which a tool-using reflector LLM autonomously refines a bounded skill prompt across multiple rounds. Beyond cold-start leaderboard scores, IDC exposes two additional observables for each (agent, game) pair: how the score evolves across reflection rounds, and how the learned skill behaves on held-out task variants. We report these observables for twelve VLM agents on the cold-start leaderboard and four top agents under IDC.

0 Citations
0 Influential
4 Altmetric
20.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!