2602.00148v2 Jan 29, 2026 cs.CV

신경망 가우시안 힘장을 이용한 물리학 기반 4차원 동역학 학습

Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields

Yixin Zhu

Citations: 630

h-index: 13

Shiqian Li

Citations: 52

h-index: 3

Ruihong Shen

Citations: 7

h-index: 2

Junfeng Ni

Citations: 255

h-index: 9

Chang Pan

Citations: 3

h-index: 1

Chi Zhang

Citations: 58

h-index: 4

원시 시각 데이터로부터 물리적 동역학을 예측하는 것은 인공지능 분야의 주요 과제입니다. 최근 비디오 생성 모델은 놀라운 시각적 품질을 달성했지만, 물리 법칙에 대한 모델링 부족으로 인해 여전히 물리적으로 타당한 비디오를 일관되게 생성하지 못합니다. 3D 가우시안 스플래팅과 물리 엔진을 결합한 최근 접근 방식은 물리적으로 타당한 비디오를 생성할 수 있지만, 재구성 및 시뮬레이션 모두에서 높은 계산 비용이 발생하며, 복잡한 실제 환경에서 견고성이 부족한 경우가 많습니다. 이러한 문제를 해결하기 위해, 본 연구에서는 신경망 가우시안 힘장(NGFF)을 제안합니다. NGFF는 3D 가우시안 인식과 물리 기반 동역학 모델링을 통합한 엔드 투 엔드 신경망 프레임워크로, 다중 뷰 RGB 입력을 통해 상호작용적이고 물리적으로 사실적인 4차원 비디오를 생성하며, 기존 가우시안 시뮬레이터보다 두 배 빠른 속도를 달성합니다. 또한, 학습을 지원하기 위해 다양한 재질, 다중 객체 상호작용, 복잡한 장면을 포함하는 4차원 가우시안 데이터셋인 GSCollision을 제공합니다. GSCollision은 총 64만 개 이상의 렌더링된 물리 시뮬레이션 비디오(~4TB)로 구성됩니다. 합성 및 실제 3D 시나리오에서의 평가 결과, NGFF는 뛰어난 일반화 능력과 물리적 추론에서의 견고성을 보여주며, 비디오 예측을 물리 기반의 세계 모델로 발전시키는 데 기여합니다.

Original Abstract

Predicting physical dynamics from raw visual data remains a major challenge in AI. While recent video generation models have achieved impressive visual quality, they still cannot consistently generate physically plausible videos due to a lack of modeling of physical laws. Recent approaches combining 3D Gaussian splatting and physics engines can produce physically plausible videos, but are hindered by high computational costs in both reconstruction and simulation, and often lack robustness in complex real-world scenarios. To address these issues, we introduce Neural Gaussian Force Field (NGFF), an end-to-end neural framework that integrates 3D Gaussian perception with physics-based dynamic modeling to generate interactive, physically realistic 4D videos from multi-view RGB inputs, achieving two orders of magnitude faster than prior Gaussian simulators. To support training, we also present GSCollision, a 4D Gaussian dataset featuring diverse materials, multi-object interactions, and complex scenes, totaling over 640k rendered physical videos (~4 TB). Evaluations on synthetic and real 3D scenarios show NGFF's strong generalization and robustness in physical reasoning, advancing video prediction towards physics-grounded world models.

3 Citations

0 Influential

6.5 Altmetric

35.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!