2603.16868v1 Mar 17, 2026 cs.CV

MessyKitchens: 접촉 정보가 풍부한 객체 기반 3차원 장면 복원

MessyKitchens: Contact-rich object-level 3D scene reconstruction

J. Ansari

Citations: 287

h-index: 5

Ran Ding

Citations: 10

h-index: 2

Fabio Pizzati

Citations: 225

h-index: 6

Ivan Laptev

Citations: 8

h-index: 2

단안 3차원 장면 복원은 최근 상당한 발전을 이루었습니다. 현대적인 신경망 아키텍처와 대규모 데이터의 힘으로, 최근 방법들은 단일 이미지로부터 깊이 추정 성능에서 높은 결과를 보여줍니다. 그러나 다양한 객체의 존재, 빈번한 가려짐, 그리고 복잡한 객체 관계 등으로 인해 일반적인 장면을 개별 3차원 객체로 복원하고 분해하는 것은 여전히 어려운 과제입니다. 특히, 개별 객체의 형태와 자세 추정 외에도 로봇 공학 및 애니메이션 분야에서는 객체가 물리 법칙인 비관통 및 현실적인 접촉을 따르는 물리적으로 타당한 장면 복원이 필요합니다. 본 연구에서는 객체 기반 장면 복원을 두 가지 방향으로 발전시켰습니다. 첫째, 실제 환경의 혼잡한 장면을 특징으로 하고 3차원 객체의 형태, 자세 및 정확한 객체 접촉에 대한 고정밀 객체 수준의 ground truth를 제공하는 새로운 데이터셋인 MessyKitchens를 소개합니다. 둘째, 단일 객체 복원을 위한 최근의 SAM 3D 접근 방식을 기반으로, 객체 수준의 장면 복원을 위한 Multi-Object Decoder (MOD)를 확장했습니다. 우리의 기여를 검증하기 위해, MessyKitchens가 기존 데이터셋에 비해 등록 정확도 및 객체 간 침투 측면에서 상당한 개선을 가져옴을 보여줍니다. 또한, 우리의 다중 객체 복원 접근 방식은 세 개의 데이터셋에서 비교되었으며, MOD가 최첨단 기술에 비해 일관되고 중요한 성능 향상을 보여줍니다. 우리의 새로운 벤치마크, 코드 및 사전 훈련된 모델은 프로젝트 웹사이트에서 공개될 예정입니다: https://messykitchens.github.io/.

Original Abstract

Monocular 3D scene reconstruction has recently seen significant progress. Powered by the modern neural architectures and large-scale data, recent methods achieve high performance in depth estimation from a single image. Meanwhile, reconstructing and decomposing common scenes into individual 3D objects remains a hard challenge due to the large variety of objects, frequent occlusions and complex object relations. Notably, beyond shape and pose estimation of individual objects, applications in robotics and animation require physically-plausible scene reconstruction where objects obey physical principles of non-penetration and realistic contacts. In this work we advance object-level scene reconstruction along two directions. First, we introduceMessyKitchens, a new dataset with real-world scenes featuring cluttered environments and providing high-fidelity object-level ground truth in terms of 3D object shapes, poses and accurate object contacts. Second, we build on the recent SAM 3D approach for single-object reconstruction and extend it with Multi-Object Decoder (MOD) for joint object-level scene reconstruction. To validate our contributions, we demonstrate MessyKitchens to significantly improve previous datasets in registration accuracy and inter-object penetration. We also compare our multi-object reconstruction approach on three datasets and demonstrate consistent and significant improvements of MOD over the state of the art. Our new benchmark, code and pre-trained models will become publicly available on our project website: https://messykitchens.github.io/.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!