2604.22160v1 Apr 24, 2026 cs.CV

GenMatter: 생성적 물질 모델을 이용한 물리적 객체 인식

GenMatter: Perceiving Physical Objects with Generative Matter Models

Josh Tenenbaum

Citations: 208

h-index: 2

Eric Li

Citations: 192

h-index: 5

Arijit Dasgupta

Citations: 45

h-index: 3

Yoni Friedman

Citations: 122

h-index: 5

Mathieu Huot

Citations: 259

h-index: 9

Vikash K. Mansinghka

Citations: 4,846

h-index: 32

T. O’Connell

Citations: 2

h-index: 1

William T. Freeman

Citations: 1,812

h-index: 2

인간의 시각적 인식은 움직임 기반의 장면 해석에 대한 계산 원리를 이해하는 데 귀중한 통찰력을 제공합니다. 인간은 움직이는 점, 질감 있는 표면 또는 자연스러운 장면을 관찰할 때, 독립적으로 움직일 수 있는 물질 덩어리를 안정적으로 감지하고 분할합니다. 반면, 기존의 컴퓨터 비전 시스템은 이러한 다양한 환경에서 작동할 수 있는 통합적인 접근 방식을 갖추고 있지 않습니다. 인간의 인지 원리에서 영감을 받아, 우리는 저수준의 움직임 신호와 고수준의 외관 특징을 입자(물질의 국소적인 영역을 나타내는 작은 가우시안 분포)로 계층적으로 그룹화하고, 입자들을 그룹화하여 일관성 있고 독립적으로 움직이는 물리적 객체를 파악하는 생성 모델을 제안합니다. 우리는 병렬 블록 Gibbs 샘플링을 기반으로 한 하드웨어 가속 추론 알고리즘을 개발하여 안정적인 입자 움직임과 그룹화를 복구합니다. 우리의 모델은 다양한 유형의 입력(무작위 점, 스타일화된 질감 또는 자연스러운 RGB 비디오)으로 작동하여, 생물학적 시각이 성공하지만 기존의 컴퓨터 비전 접근 방식이 실패하는 환경에서도 작동할 수 있도록 합니다. 우리는 이 통합 프레임워크를 세 가지 영역에서 검증했습니다. 2차원 무작위 점 운동 그림에서, 우리의 접근 방식은 인간의 객체 인식을 포착하며, 모호한 조건에서 그레이디언트 불확실성을 나타냅니다. 게슈탈트 원리에 영감을 받은 위장 회전 객체 데이터 세트에서, 우리의 접근 방식은 움직임으로부터 정확한 3차원 구조를 복구하여 정확한 2차원 객체 분할을 가능하게 합니다. 자연스러운 RGB 비디오에서, 우리의 모델은 변형되는 객체를 구성하는 움직이는 3차원 물질을 추적하여 객체 수준의 장면 이해를 가능하게 합니다. 따라서 본 연구는 인간의 시각 원리에 기반한 움직임 기반 인식을 위한 일반적인 프레임워크를 제시합니다.

Original Abstract

Human visual perception offers valuable insights for understanding computational principles of motion-based scene interpretation. Humans robustly detect and segment moving entities that constitute independently moveable chunks of matter, whether observing sparse moving dots, textured surfaces, or naturalistic scenes. In contrast, existing computer vision systems lack a unified approach that works across these diverse settings. Inspired by principles of human perception, we propose a generative model that hierarchically groups low-level motion cues and high-level appearance features into particles (small Gaussians representing local matter), and groups particles into clusters capturing coherently and independently moveable physical entities. We develop a hardware-accelerated inference algorithm based on parallelized block Gibbs sampling to recover stable particle motion and groupings. Our model operates on different kinds of inputs (random dots, stylized textures, or naturalistic RGB video), enabling it to work across settings where biological vision succeeds but existing computer vision approaches do not. We validate this unified framework across three domains: on 2D random dot kinematograms, our approach captures human object perception including graded uncertainty across ambiguous conditions; on a Gestalt-inspired dataset of camouflaged rotating objects, our approach recovers correct 3D structure from motion and thereby accurate 2D object segmentation; and on naturalistic RGB videos, our model tracks the moving 3D matter that makes up deforming objects, enabling robust object-level scene understanding. This work thus establishes a general framework for motion-based perception grounded in principles of human vision.

0 Citations

0 Influential

16 Altmetric

80.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!