2601.05848v1 Jan 09, 2026 cs.CV

목표 힘(Goal Force): 물리학적 조건에 따른 목표를 달성하도록 비디오 모델 훈련

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

Nate Gillman

Brown University

Citations: 153

h-index: 5

Yin Zhou

Citations: 12,240

h-index: 23

Zitian Tang

Citations: 31

h-index: 4

Evan Luo

Citations: 44

h-index: 2

Arjan Chakravarthy

Citations: 5

h-index: 1

Michael Freeman

Cornell University

Citations: 91

h-index: 5

Charles Herrmann

Citations: 106

h-index: 3

Chen Sun

Citations: 62

h-index: 4

Daksh Aggarwal

Citations: 102

h-index: 4

최근 비디오 생성 기술의 발전은 로봇 공학 및 계획을 위한 잠재적 미래를 시뮬레이션할 수 있는 '세계 모델' 개발을 가능하게 했습니다. 그러나 이러한 모델에 대한 정확한 목표를 지정하는 것은 여전히 어려운 과제입니다. 텍스트 지침은 종종 물리적 미묘함을 포착하기에는 너무 추상적이며, 대상 이미지는 종종 동적 작업에 대해 지정하기 어렵습니다. 이러한 문제를 해결하기 위해, 우리는 사용자가 명시적인 힘 벡터와 중간 역학을 통해 목표를 정의할 수 있도록 하는 새로운 프레임워크인 '목표 힘(Goal Force)'을 소개합니다. 이 프레임워크는 인간이 물리적 작업을 개념화하는 방식을 반영합니다. 우리는 탄성 충돌 및 도미노 현상과 같은 인공적인 인과 관계 원시 데이터를 사용하여 비디오 생성 모델을 훈련시켜, 모델이 시간과 공간을 통해 힘을 전달하도록 합니다. 간단한 물리 데이터로 훈련되었음에도 불구하고, 우리의 모델은 도구 조작 및 다중 객체 인과 관계와 같은 복잡한 실제 시나리오에서 뛰어난 제로샷 일반화 능력을 보여줍니다. 우리의 결과는 비디오 생성을 기본적인 물리적 상호 작용에 기반함으로써, 모델이 암묵적인 신경 네트워크 기반 물리 시뮬레이터로 진화하여 외부 엔진에 의존하지 않고도 정확하고 물리적 제약 조건을 고려한 계획을 가능하게 할 수 있음을 시사합니다. 우리는 모든 데이터 세트, 코드, 모델 가중치 및 대화형 비디오 데모를 프로젝트 페이지에서 제공합니다.

Original Abstract

Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object causal chains. Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators, enabling precise, physics-aware planning without reliance on external engines. We release all datasets, code, model weights, and interactive video demos at our project page.

5 Citations

0 Influential

11.5 Altmetric

62.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!