2606.16262v1 Jun 15, 2026 cs.SE

UXBench: Measuring the Actionability of LLM-Generated UX Critiques

Xiangliang Zhang
Xiangliang Zhang
Citations: 997
h-index: 16
Hang Hua
Hang Hua
Citations: 121
h-index: 6
Wenjie Wang
Wenjie Wang
Citations: 69
h-index: 4
Zipeng Ling
Zipeng Ling
Citations: 29
h-index: 4
Yue Huang
Yue Huang
Citations: 662
h-index: 10
Shiyi Du
Shiyi Du
Citations: 47
h-index: 2
Yuexing Hao
Yuexing Hao
Citations: 17
h-index: 3
Xiaomin Li
Xiaomin Li
Citations: 129
h-index: 6
Han Bao
Han Bao
Citations: 93
h-index: 4
Yuchen Ma
Yuchen Ma
Citations: 70
h-index: 4
Yanfang Ye
Yanfang Ye
Citations: 193
h-index: 7
Xiaonan Luo
Xiaonan Luo
Citations: 22
h-index: 2
Yu Jiang
Yu Jiang
Citations: 125
h-index: 2
Di Wang
Di Wang
Citations: 7
h-index: 1

Large language models (LLMs) are increasingly deployed as UX judges that inspect interfaces, diagnose usability problems, and propose repairs. Yet no controlled benchmark measures whether the resulting critiques are reliable and actionable across heterogeneous product surfaces. We introduce UXBench, a benchmark for evaluating LLMs as interaction-grounded UX judges. UXBench comprises local-first runnable web fixtures spanning ten product-surface families, paired with coverage-gated browser exploration that forces models to collect interaction evidence before reporting. Each judge model produces a structured UX report over seven rubric dimensions; report quality is measured by whether a fixed downstream repair agent can improve the interface based on the critique. We evaluate eight frontier models under both an automated repair-lift protocol and a blind human validation study. Results show that UX judging is neither saturated nor one dimensional: models differ meaningfully in report actionability, exhibit distinct rubric-level repair signatures, vary in fixture-level reliability, and trade leadership across surface categories

0 Citations
0 Influential
8 Altmetric
40.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!