2605.26525v1 May 26, 2026 cs.CV

ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation

Akide Liu
Akide Liu
Citations: 18
h-index: 3
Bohan Zhuang
Bohan Zhuang
Citations: 6,351
h-index: 37
Weijie Wang
Weijie Wang
Citations: 271
h-index: 4
Gholamreza Haffari
Gholamreza Haffari
Citations: 10
h-index: 1
Jinbo Xing
Jinbo Xing
Citations: 42
h-index: 4
Chaojie Mao
Chaojie Mao
Citations: 2,528
h-index: 10
Yefei He
Yefei He
Citations: 96
h-index: 5
Zeyu Zhang
Zeyu Zhang
Citations: 19
h-index: 2
Ye Li
Ye Li
Citations: 69
h-index: 4
Zihan Wang
Zihan Wang
Citations: 25
h-index: 2
Y. Liu
Y. Liu
Citations: 499
h-index: 7

Minute-scale cinematic video generation is a central challenge for generative video models. Existing paradigms address only fragments of this challenge: single-shot extrapolation preserves an anchor but lacks cinematic structure, while multi-shot storytelling imposes structure yet remains free to invent its visual states rather than continue an observed one. We define Multi-Shot Video Extrapolation (MSVE), a task that extends an observed frame or clip into a sequence of cinematically structured shots while preserving anchor state and advancing narrative intent. This setting operates under the finite per-call generation budget of short-video models. We identify three coupled bottlenecks: (1) global planners over-specify unsupported details from full screenplays; (2) shot-level prompts dilute task-relevant state when carrying the complete story; and (3) temporal chaining turns generated frames into a lossy memory in which identity, scene, object, and action state decay. MSVE reveals that long-video failure is not merely a limitation of context length, but a failure of context allocation. We propose Recursive Context Allocation (ReCA), an inference-time framework that allocates context hierarchically across planning and generation. ReCA recursively decomposes MSVE into context-bounded subproblems, invokes frozen generators at leaf nodes, and propagates structured state updates across time. To evaluate this setting, we further propose MSVE-Bench and NB-Q, a source-grounded protocol with prompts purpose-built for 3 to 5 minute long-video generation, a regime not addressed by existing short-clip benchmarks. Compared to previous methods, ReCA improves average normalized score by 8 to 16 percent over the strongest competing controller and improves multi-shot consistency metrics by 28 to 43 percent. View the project page at https://reca.vmv.re.

0 Citations
0 Influential
18.5 Altmetric
92.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!