2602.15733v1 Feb 17, 2026 cs.RO

MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction

Jian Tang
Jian Tang
Citations: 33
h-index: 4
Qiang Zhang
Qiang Zhang
Citations: 378
h-index: 11
Jiahao Ma
Jiahao Ma
Citations: 4
h-index: 1
Peiran Liu
Peiran Liu
Citations: 14
h-index: 2
Zeran Su
Zeran Su
Citations: 0
h-index: 0
Zifan Wang
Zifan Wang
Citations: 3
h-index: 1
Jingkai Sun
Jingkai Sun
Citations: 272
h-index: 10
Wei Cui
Wei Cui
Citations: 7
h-index: 2
Jialing Yu
Jialing Yu
Citations: 304
h-index: 10
Gang Han
Gang Han
Citations: 85
h-index: 5
Wen Zhao
Wen Zhao
Citations: 87
h-index: 5
Pihai Sun
Pihai Sun
Citations: 14
h-index: 2
Kangning Yin
Kangning Yin
Citations: 63
h-index: 4
Jiaxu Wang
Jiaxu Wang
Citations: 216
h-index: 10
Jiahang Cao
Jiahang Cao
Citations: 21
h-index: 3
Lingfeng Zhang
Lingfeng Zhang
Citations: 100
h-index: 5
Haotai Cheng
Haotai Cheng
Citations: 104
h-index: 5
Junwei Liang
Junwei Liang
Citations: 10
h-index: 2
Renjing Xu
Renjing Xu
Citations: 438
h-index: 12
Yijie Guo
Yijie Guo
Citations: 85
h-index: 5
Shuai Shi
Shuai Shi
Citations: 17
h-index: 3
Xiaoshuai Hao
Xiaoshuai Hao
Citations: 32
h-index: 3
Yiding Ji
Yiding Ji
Citations: 18
h-index: 2

Humanoid motion control has witnessed significant breakthroughs in recent years, with deep reinforcement learning (RL) emerging as a primary catalyst for achieving complex, human-like behaviors. However, the high dimensionality and intricate dynamics of humanoid robots make manual motion design impractical, leading to a heavy reliance on expensive motion capture (MoCap) data. These datasets are not only costly to acquire but also frequently lack the necessary geometric context of the surrounding physical environment. Consequently, existing motion synthesis frameworks often suffer from a decoupling of motion and scene, resulting in physical inconsistencies such as contact slippage or mesh penetration during terrain-aware tasks. In this work, we present MeshMimic, an innovative framework that bridges 3D scene reconstruction and embodied intelligence to enable humanoid robots to learn coupled "motion-terrain" interactions directly from video. By leveraging state-of-the-art 3D vision models, our framework precisely segments and reconstructs both human trajectories and the underlying 3D geometry of terrains and objects. We introduce an optimization algorithm based on kinematic consistency to extract high-quality motion data from noisy visual reconstructions, alongside a contact-invariant retargeting method that transfers human-environment interaction features to the humanoid agent. Experimental results demonstrate that MeshMimic achieves robust, highly dynamic performance across diverse and challenging terrains. Our approach proves that a low-cost pipeline utilizing only consumer-grade monocular sensors can facilitate the training of complex physical interactions, offering a scalable path toward the autonomous evolution of humanoid robots in unstructured environments.

0 Citations
0 Influential
6 Altmetric
30.0 Score

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!