2606.11683v1 Jun 10, 2026 cs.CV

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Xiaofeng Cao

Citations: 74

h-index: 5

Chaofan Ma

Citations: 525

h-index: 12

Zhenjie Mao

Citations: 8

h-index: 2

Yu-Hao Yang

Citations: 287

h-index: 8

Fanqin Zeng

Citations: 28

h-index: 2

Yue Shi

Citations: 2

h-index: 1

Yingjie Zhou

Citations: 25

h-index: 2

Jiangchao Yao

Citations: 31

h-index: 2

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, an MLLM forms a spatial hypothesis from the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesized novel-view video. To enable effective cross-view revisiting, we design a Geometry-to-Video pipeline that renders strategically complementary novel views from predicted 3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving the MLLM's native video interface without architectural modifications. Extensive evaluations on VSI-Bench and STI-Bench demonstrate that ReRe substantially boosts open-source MLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!