2604.03315v1 Apr 01, 2026 cs.CV

StoryBlender: 시공간적 동역학을 갖춘, 컷 간 일관성이 유지되고 편집 가능한 3D 스토리보드

StoryBlender: Inter-Shot Consistent and Editable 3D Storyboard with Spatial-temporal Dynamics

Zhenhong Sun

Citations: 37

h-index: 4

Yatao Bian

Citations: 4

h-index: 1

Daoyi Dong

Citations: 15

h-index: 3

Hongdong Li

Citations: 102

h-index: 4

Jiaming Bian

Citations: 20

h-index: 1

Yueh-Hua Wu

Citations: 148

h-index: 4

Yifu Wang

Citations: 42

h-index: 2

Huadong Mo

Citations: 3

h-index: 1

Bingliang Li

Citations: 66

h-index: 4

스토리보드는 영화, 애니메이션, 게임 등 시각적 스토리텔링의 핵심 기술입니다. 그러나 이 과정을 자동화하려면 현재 접근 방식이 거의 만족시키지 못하는 두 가지 속성, 즉 컷 간 일관성과 명시적인 편집 가능성을 갖춘 시스템이 필요합니다. 2D 확산 기반 생성기는 생생한 이미지를 생성하지만, 종종 동일성 변화와 제한적인 기하학적 제어 문제를 겪습니다. 반면, 기존의 3D 애니메이션 워크플로우는 일관성과 편집 가능성을 제공하지만, 전문가의 도움이 필요하고 노동 집약적인 제작 과정이 필요합니다. 본 논문에서는 스토리 중심의 반사 방식을 기반으로 하는 3D 스토리보드 생성 프레임워크인 StoryBlender를 소개합니다. StoryBlender 시스템은 세 단계 파이프라인으로 구성됩니다. (1) 의미-공간적 연결: 글로벌 자산을 샷별 변수와 분리하여 장기적인 일관성을 유지하는 연속성 메모리 그래프를 구축합니다. (2) 표준 자산 실체화: 시각적 동일성을 유지하기 위해 모든 엔티티를 통합된 좌표 공간에 배치합니다. (3) 시공간적 동역학: 시각적 지표를 통해 레이아웃 디자인과 영화적 연출을 구현합니다. StoryBlender는 여러 에이전트를 계층적으로 구성하고 검증 루프 내에서 반복적으로 자체 수정하여 엔진을 통해 검증된 피드백을 통해 공간적 오류를 수정합니다. 결과적으로 생성된 3D 장면은 카메라 및 시각적 자산을 직접적으로 정밀하게 편집할 수 있으며, 동시에 컷 간의 일관성을 유지합니다. 실험 결과, StoryBlender는 확산 기반 방법과 3D 기반 방법 모두에 비해 일관성과 편집 가능성을 크게 향상시키는 것을 보여줍니다. 코드, 데이터 및 데모 영상은 https://engineeringai-lab.github.io/StoryBlender/ 에서 확인할 수 있습니다.

Original Abstract

Storyboarding is a core skill in visual storytelling for film, animation, and games. However, automating this process requires a system to achieve two properties that current approaches rarely satisfy simultaneously: inter-shot consistency and explicit editability. While 2D diffusion-based generators produce vivid imagery, they often suffer from identity drift along with limited geometric control; conversely, traditional 3D animation workflows are consistent and editable but require expert-heavy, labor-intensive authoring. We present StoryBlender, a grounded 3D storyboard generation framework governed by a Story-centric Reflection Scheme. At its core, we propose the StoryBlender system, which is built on a three-stage pipeline: (1) Semantic-Spatial Grounding, to construct a continuity memory graph to decouple global assets from shot-specific variables for long-horizon consistency; (2) Canonical Asset Materialization, to instantiate entities in a unified coordinate space to maintain visual identity; and (3) Spatial-Temporal Dynamics, to achieve layout design and cinematic evolution through visual metrics. By orchestrating multiple agents in a hierarchical manner within a verification loop, StoryBlender iteratively self-corrects spatial hallucinations via engine-verified feedback. The resulting native 3D scenes support direct, precise editing of cameras and visual assets while preserving unwavering multi-shot continuity. Experiments demonstrate that StoryBlender significantly improves consistency and editability over both diffusion-based and 3D-grounded baselines. Code, data, and demonstration video will be available on https://engineeringai-lab.github.io/StoryBlender/

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!