2606.12217v1 Jun 10, 2026 cs.CV

Making Foresight Actionable: Repurposing Representation Alignment in World Action Models

Yi Chen
Yi Chen
Citations: 418
h-index: 9
Yuying Ge
Yuying Ge
Citations: 4,788
h-index: 29
Yixiao Ge
Yixiao Ge
Citations: 599
h-index: 13
Lu Qiu
Lu Qiu
Citations: 91
h-index: 4
Xihui Liu
Xihui Liu
Citations: 527
h-index: 11
Yizhuo Li
Yizhuo Li
Citations: 202
h-index: 6

World Action Models (WAMs) offer a promising route for robot manipulation by using video generation models to model future scene evolution before producing control actions. However, our empirical observations reveal a phenomenon: generating plausible visual futures does not always guarantee the extraction of accurate actions. To diagnose this failure, we conduct action-head attention analysis and causal interventions. We find that the action decoder fails to focus on task-relevant interaction regions and remains sensitive to perturbations in task-irrelevant areas. This reveals a representation mismatch: hidden states optimized for visual reconstruction are not inherently organized in a form useful for low-level action control. In this paper, we propose AGRA, an Action-Grounded Representation Alignment objective that regularizes the world-action interface by aligning intermediate video diffusion features with spatially coherent semantic representations from a foundation visual encoder. We evaluate AGRA on real-world manipulation tasks. Experiments show that AGRA makes world model representations more action-grounded: by focusing the action decoder on the correct interaction regions, it improves object localization accuracy and affordance understanding, and makes the policy more robust to perturbations in task-irrelevant regions. As a result, AGRA consistently improves both in-distribution performance and out-of-distribution generalization over the baseline world action model.

0 Citations
0 Influential
14.5 Altmetric
72.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!