2602.20159v1 Feb 23, 2026 cs.CV

초대형 비디오 추론 스위트

A Very Big Video Reasoning Suite

Qianli Ma

Shanghai Jiao Tong University

Citations: 317

h-index: 7

Daniel Khashabi

Citations: 353

h-index: 9

Vikash Kumar

Citations: 491

h-index: 7

Hanwen Xing

Citations: 127

h-index: 3

Ruisi Wang

Citations: 214

h-index: 7

Juyi Lin

Citations: 33

h-index: 2

Ran Ji

Citations: 15

h-index: 2

Thaddaus Wiedemer

Citations: 198

h-index: 5

Dezhi Luo

Citations: 123

h-index: 6

Lianyu Huang

Citations: 15

h-index: 2

Hang He

Citations: 61

h-index: 4

Yifan Zhou

Citations: 484

h-index: 11

Lingzi Guo

Citations: 16

h-index: 2

Lantao Mei

Citations: 3,278

h-index: 3

Jiacheng Li

Citations: 27

h-index: 3

Boyang Zhong

Citations: 17

h-index: 2

Ze Zhao

Citations: 48

h-index: 3

Gaoyun Fang

Citations: 134

h-index: 5

John Kitaoka

Citations: 12

h-index: 1

Yile Xu

Citations: 19

h-index: 3

Hua Xu

Citations: 20

h-index: 2

Kenton Blacutt

Citations: 12

h-index: 1

Tin Nguyen

Auburn University

Citations: 35

h-index: 3

Siyuan Song

Citations: 118

h-index: 4

Shao-Zhi Wen

Citations: 12

h-index: 1

Runming Wang

Citations: 26

h-index: 2

Yanzhi Wang

Citations: 150

h-index: 7

Ziqiao Ma

University of Michigan

Citations: 1,291

h-index: 17

Raphaël Millière

Citations: 93

h-index: 5

Freda Shi

Citations: 38

h-index: 3

Nuno Vasconcelos

Citations: 100

h-index: 5

A. Yuille

Citations: 152

h-index: 7

Yilun Du

Citations: 725

h-index: 12

Bo Li

Citations: 67

h-index: 3

Dahua Lin

Citations: 117

h-index: 8

Yijiang Li

Citations: 125

h-index: 6

Maijunxian Wang

Citations: 20

h-index: 2

Qingying Gao

Citations: 105

h-index: 6

Lei Yang

Citations: 196

h-index: 9

Yaoyao Qian

Citations: 91

h-index: 4

Jiahui Ge

Citations: 13

h-index: 1

Tianqi Zhao

Citations: 119

h-index: 5

Feng Yu

Citations: 80

h-index: 4

Wei Xiao

Citations: 56

h-index: 4

Yizheng Jiao

Citations: 576

h-index: 12

Pengcheng Xu

Citations: 237

h-index: 10

Haoran Sun

Citations: 66

h-index: 3

Linyang He

Citations: 89

h-index: 5

Mengyu Yang

Citations: 14

h-index: 1

Ziming Liu

Citations: 33

h-index: 3

Ziwei Liu

Citations: 4,425

h-index: 18

Zhongang Cai

MMLab@NTU, Nanyang Technological University

Citations: 4,497

h-index: 31

Jian Hou

Citations: 25

h-index: 3

Ze-Wen Hong

Citations: 247

h-index: 10

Hokin Deng

Citations: 127

h-index: 6

Danyang Zhang

Citations: 53

h-index: 4

비디오 모델의 급속한 발전은 주로 시각적 품질에 초점을 맞추어 왔으며, 그 추론 능력은 충분히 탐구되지 않은 채로 남아 있습니다. 비디오 추론은 텍스트가 자연스럽게 포착할 수 있는 한계를 넘어 시공간적으로 일관된 시각적 환경에 지능의 기반을 두며, 연속성, 상호작용, 인과관계와 같은 시공간적 구조에 대한 직관적인 추론을 가능하게 합니다. 그러나 비디오 추론 및 그 확장 거동(scaling behavior)에 대한 체계적인 연구는 대규모 훈련 데이터의 부족으로 인해 한계에 부딪혀 있습니다. 이러한 공백을 해결하기 위해, 우리는 기존 데이터셋보다 약 3자릿수 더 큰 규모를 자랑하며, 원칙적인 분류 체계에 따른 200개의 선별된 추론 작업과 100만 개 이상의 비디오 클립을 포괄하는 전례 없는 대규모 리소스인 VBVR(Very Big Video Reasoning) 데이터셋을 소개합니다. 나아가 우리는 모델 기반의 평가를 넘어 규칙 기반 및 인간의 판단과 일치하는 채점자를 통합하여, 비디오 추론 능력에 대한 재현 가능하고 해석 가능한 진단을 가능하게 하는 검증 가능한 평가 프레임워크인 VBVR-Bench를 제시합니다. 우리는 VBVR 스위트를 활용하여 비디오 추론에 대한 최초의 대규모 스케일링 연구 중 하나를 수행하였으며, 본 적 없는(unseen) 추론 작업에 대한 창발적 일반화의 초기 징후를 관찰했습니다. 종합적으로 VBVR은 일반화 가능한 비디오 추론 연구의 다음 단계를 위한 기반을 마련합니다. 데이터, 벤치마크 툴킷 및 모델은 https://video-reason.com/ 에서 공개적으로 이용할 수 있습니다.

Original Abstract

Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence in spatiotemporally consistent visual environments that go beyond what text can naturally capture, enabling intuitive reasoning over spatiotemporal structure such as continuity, interaction, and causality. However, systematically studying video reasoning and its scaling behavior is hindered by the lack of large-scale training data. To address this gap, we introduce the Very Big Video Reasoning (VBVR) Dataset, an unprecedentedly large-scale resource spanning 200 curated reasoning tasks following a principled taxonomy and over one million video clips, approximately three orders of magnitude larger than existing datasets. We further present VBVR-Bench, a verifiable evaluation framework that moves beyond model-based judging by incorporating rule-based, human-aligned scorers, enabling reproducible and interpretable diagnosis of video reasoning capabilities. Leveraging the VBVR suite, we conduct one of the first large-scale scaling studies of video reasoning and observe early signs of emergent generalization to unseen reasoning tasks. Together, VBVR lays a foundation for the next stage of research in generalizable video reasoning. The data, benchmark toolkit, and models are publicly available at https://video-reason.com/ .

12 Citations

2 Influential

15.5 Altmetric

93.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!