2603.22286v1 Mar 23, 2026 cs.CV

WorldCache: 콘텐츠 인식 기반 캐싱을 통한 고속 비디오 세계 모델 학습

WorldCache: Content-Aware Caching for Accelerated Video World Models

Ahmed Heakl

Citations: 271

h-index: 8

F. Khan

Citations: 308

h-index: 10

Umair Nawaz

Citations: 47

h-index: 3

Ufaq Khan

Citations: 62

h-index: 4

Abdelrahman M. Shaker

Citations: 2,129

h-index: 15

Salman Khan

Citations: 9

h-index: 2

디퓨전 트랜스포머(DiT)는 고품질 비디오 세계 모델을 구현하지만, 순차적인 디노이징 과정과 비용이 많이 드는 시공간적 어텐션으로 인해 계산 비용이 높습니다. 학습 없이 특징을 캐싱하여 중간 활성화 값을 재사용하면 추론 속도를 높일 수 있지만, 기존 방법은 대부분 제로-오더 홀드(Zero-Order Hold) 가정에 의존합니다. 즉, 전역적인 변화가 작을 때 캐시된 특징을 정적인 스냅샷으로 재사용합니다. 이는 동적인 장면에서 종종 그림자 현상, 흐릿함 및 움직임 불일치 문제를 야기합니다. 본 논문에서는 extbf{WorldCache}라는 인지 제약 기반 동적 캐싱 프레임워크를 제안합니다. WorldCache는 특징을 언제, 어떻게 재사용할지를 개선합니다. WorldCache는 움직임에 적응하는 임계값, 중요한 영역에 가중치를 둔 드리프트 추정, 블렌딩 및 워핑을 통한 최적 근사, 그리고 디퓨전 단계에 따른 위상 인식 임계값 스케줄링을 도입합니다. 이러한 통합적인 접근 방식을 통해 재학습 없이도 적응적이고 움직임 일관성이 뛰어난 특징 재사용이 가능합니다. PAI-Bench에서 Cosmos-Predict2.5-2B 데이터셋으로 평가한 결과, WorldCache는 기존의 학습 없는 캐싱 방식보다 extbf{2.3배} 빠른 추론 속도를 달성하면서도, extbf{99.4%} 수준의 기본 품질을 유지했습니다. 저희의 코드는 [https://umair1221.github.io/World-Cache/](https://umair1221.github.io/World-Cache/) 에서 확인하실 수 있습니다.

Original Abstract

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose \textbf{WorldCache}, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-aware threshold scheduling across diffusion steps. Our cohesive approach enables adaptive, motion-consistent feature reuse without retraining. On Cosmos-Predict2.5-2B evaluated on PAI-Bench, WorldCache achieves \textbf{2.3$\times$} inference speedup while preserving \textbf{99.4\%} of baseline quality, substantially outperforming prior training-free caching approaches. Our code can be accessed on \href{https://umair1221.github.io/World-Cache/}{World-Cache}.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!