2601.09031v1 Jan 13, 2026 cs.RO

인형 로봇 조작을 위한 일반화 가능한 기하학적 사전 지식 및 순환 스파이킹 특징 학습

Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation

Xuetao Li

Citations: 9

h-index: 2

Miao Li

Citations: 2

h-index: 1

Wenke Huang

Citations: 1,958

h-index: 19

Mang Ye

Citations: 706

h-index: 13

Jifeng Xuan

Citations: 11

h-index: 2

Bo Du

Citations: 893

h-index: 13

Sheng Liu

Citations: 27

h-index: 3

인형 로봇 조작은 다양한 인간 수준의 작업을 수행하기 위한 중요한 연구 분야이며, 고차원 의미론적 추론과 저차원 행동 생성 과정을 포함합니다. 그러나 정확한 장면 이해와 인간 시연으로부터의 효율적인 학습은 여전히 중요한 과제이며, 이는 기존 프레임워크의 적용 가능성과 일반화 성능을 심각하게 저해합니다. 본 논문에서는 순환 기하학적 사전 지식 기반 다중 모드 정책 (RGMP-S)을 제안합니다. RGMP-S는 고차원 기술 추론과 데이터 효율적인 동작 생성 기능을 모두 제공합니다. 고차원 추론을 물리적 현실에 기반하도록 하기 위해, 가벼운 2차원 기하학적 유도 편향을 활용하여 시각-언어 모델 내에서 정확한 3차원 장면 이해를 가능하게 합니다. 특히, 우리는 장기적인 기하학적 사전 지식 기반 기술 선택기를 구축하여 의미론적 지침을 공간적 제약 조건과 효과적으로 연결하고, 궁극적으로 예측되지 않은 환경에서도 강력한 일반화 성능을 달성합니다. 로봇 동작 생성에서의 데이터 효율성 문제를 해결하기 위해, 우리는 순환적 적응 스파이킹 네트워크를 도입합니다. 우리는 순환 스파이킹을 통해 로봇-객체 상호 작용을 매개변수화하여 시공간적 일관성을 확보하고, 장기적인 동적 특징을 최대한 활용하는 동시에 희소 시연 시나리오에서 발생하는 과적합 문제를 완화합니다. Maniskill 시뮬레이션 벤치마크와 세 가지 이기종의 실제 로봇 시스템 (맞춤형 인형 로봇, 데스크톱 조작기, 상업용 로봇 플랫폼)을 포함한 광범위한 실험을 통해, 제안하는 방법이 최첨단 기준 성능보다 우수하며, 다양한 일반화 시나리오에서 제안된 모듈의 효과를 검증합니다. 재현성을 높이기 위해, 소스 코드 및 시연 비디오는 다음 링크에서 공개적으로 이용할 수 있습니다: https://github.com/xtli12/RGMP-S.git.

Original Abstract

Humanoid robot manipulation is a crucial research area for executing diverse human-level tasks, involving high-level semantic reasoning and low-level action generation. However, precise scene understanding and sample-efficient learning from human demonstrations remain critical challenges, severely hindering the applicability and generalizability of existing frameworks. This paper presents a novel RGMP-S, Recurrent Geometric-prior Multimodal Policy with Spiking features, facilitating both high-level skill reasoning and data-efficient motion synthesis. To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases to enable precise 3D scene understanding within the vision-language model. Specifically, we construct a Long-horizon Geometric Prior Skill Selector that effectively aligns the semantic instructions with spatial constraints, ultimately achieving robust generalization in unseen environments. For the data efficiency issue in robotic action generation, we introduce a Recursive Adaptive Spiking Network. We parameterize robot-object interactions via recursive spiking for spatiotemporal consistency, fully distilling long-horizon dynamic features while mitigating the overfitting issue in sparse demonstration scenarios. Extensive experiments are conducted across the Maniskill simulation benchmark and three heterogeneous real-world robotic systems, encompassing a custom-developed humanoid, a desktop manipulator, and a commercial robotic platform. Empirical results substantiate the superiority of our method over state-of-the-art baselines and validate the efficacy of the proposed modules in diverse generalization scenarios. To facilitate reproducibility, the source code and video demonstrations are publicly available at https://github.com/xtli12/RGMP-S.git.

0 Citations

0 Influential

29.5 Altmetric

147.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!