2606.11683v1 Jun 10, 2026 cs.CV

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Xiaofeng Cao
Xiaofeng Cao
Citations: 74
h-index: 5
Chaofan Ma
Chaofan Ma
Citations: 525
h-index: 12
Zhenjie Mao
Zhenjie Mao
Citations: 8
h-index: 2
Yu-Hao Yang
Yu-Hao Yang
Citations: 287
h-index: 8
Fanqin Zeng
Fanqin Zeng
Citations: 28
h-index: 2
Yue Shi
Yue Shi
Citations: 2
h-index: 1
Yingjie Zhou
Yingjie Zhou
Citations: 25
h-index: 2
Jiangchao Yao
Jiangchao Yao
Citations: 31
h-index: 2

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, an MLLM forms a spatial hypothesis from the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesized novel-view video. To enable effective cross-view revisiting, we design a Geometry-to-Video pipeline that renders strategically complementary novel views from predicted 3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving the MLLM's native video interface without architectural modifications. Extensive evaluations on VSI-Bench and STI-Bench demonstrate that ReRe substantially boosts open-source MLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/

0 Citations
0 Influential
6 Altmetric
30.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!