2601.08828v1 Jan 13, 2026 cs.CV

영상 생성 모델에서의 동작 기여도 분석

Motion Attribution for Video Generation

Xindi Wu

Citations: 152

h-index: 7

Despoina Paschalidou

Citations: 561

h-index: 2

Jun Gao

Citations: 626

h-index: 9

Antonio Torralba

Citations: 400

h-index: 4

Laura Leal-Taix'e

Citations: 2

h-index: 1

Olga Russakovsky

Citations: 1,393

h-index: 15

Sanja Fidler

Citations: 2,076

h-index: 20

Jonathan Lorraine

Citations: 0

h-index: 0

영상 생성 모델의 빠른 발전에도 불구하고, 데이터가 동작에 미치는 영향은 제대로 이해되지 못하고 있습니다. 본 연구에서는 Motive (MOTIon attribution for Video gEneration)라는 동작 중심의, 그래디언트 기반 데이터 기여도 분석 프레임워크를 제안합니다. Motive는 최신 대규모 고품질 영상 데이터셋 및 모델에 적용 가능합니다. 우리는 Motive를 사용하여 어떤 미세 조정 데이터가 시간적 역학에 긍정적 또는 부정적인 영향을 미치는지 연구했습니다. Motive는 동작 가중 손실 마스크를 사용하여 정적 외형과 시간적 역학을 분리함으로써 효율적이고 확장 가능한 동작 특화 영향력 계산을 가능하게 합니다. 텍스트-영상 생성 모델에서 Motive는 동작에 강한 영향을 미치는 데이터 클립을 식별하고, 시간적 일관성과 물리적 타당성을 향상시키는 데이터 큐레이션을 안내합니다. Motive가 선택한 영향력이 큰 데이터를 활용하여, VBench 데이터셋에서 우리 방법은 동작의 부드러움과 역동성을 모두 개선했으며, 사전 학습된 기본 모델에 비해 74.1%의 인간 선호도 우수성을 달성했습니다. 현재까지, Motive는 영상 생성 모델에서 시각적 외형이 아닌 동작에 대한 기여도를 분석하고, 이를 활용하여 미세 조정 데이터를 큐레이션하는 최초의 프레임워크입니다.

Original Abstract

Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!