2602.07439v1 Feb 07, 2026 cs.RO

TextOp: 실시간 상호작용 기반 텍스트 제어 인간형 로봇 동작 생성 및 제어

TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control

Weiji Xie

Citations: 92

h-index: 4

Jiakun Zheng

Citations: 73

h-index: 2

Jinrui Han

Citations: 77

h-index: 3

Jiyuan Shi

Citations: 167

h-index: 8

Weinan Zhang

Citations: 120

h-index: 4

Chenjia Bai

Citations: 189

h-index: 8

Xuelong Li

Citations: 239

h-index: 9

최근 인간형 로봇의 전신 동작 추적 기술 발전으로 인해 실제 하드웨어에서 다양한 고정밀 동작을 수행할 수 있게 되었습니다. 그러나 기존 제어 시스템은 주로 미리 정의된 동작 경로에 의존하거나, 지속적인 인간의 원격 조작을 필요로 합니다. 전자는 사용자의 의도 변화에 유연하게 대응하기 어렵고, 후자는 인간의 지속적인 개입을 필요로 하며 자율성을 제한합니다. 본 연구에서는 실시간 및 상호작용 방식으로 범용적인 인간형 로봇 제어 시스템을 개발하는 문제를 해결하고자 합니다. 본 연구에서 제안하는 TextOp은 실시간 텍스트 기반 인간형 로봇 동작 생성 및 제어 프레임워크로, 스트리밍 방식의 언어 명령을 지원하며, 동작 실행 중에도 즉각적으로 명령을 수정할 수 있습니다. TextOp은 두 계층 구조를 채택합니다. 상위 계층에서는 autoregressive motion diffusion 모델이 현재 텍스트 입력을 기반으로 단기적인 운동 경로를 지속적으로 생성하고, 하위 계층에서는 이러한 운동 경로를 실제 인간형 로봇에 적용하는 motion tracking 정책을 실행합니다. TextOp은 상호작용적인 동작 생성과 안정적인 전신 제어를 결합하여 자유로운 의도 표현을 가능하게 하며, 춤과 점프와 같은 다양한 복잡한 동작을 하나의 연속적인 동작으로 부드럽게 연결할 수 있습니다. 광범위한 실제 로봇 실험과 오프라인 평가를 통해 빠른 응답성, 부드러운 전신 동작, 그리고 정밀한 제어 성능을 입증했습니다. 프로젝트 페이지 및 오픈 소스 코드는 https://text-op.github.io/ 에서 확인할 수 있습니다.

Original Abstract

Recent advances in humanoid whole-body motion tracking have enabled the execution of diverse and highly coordinated motions on real hardware. However, existing controllers are commonly driven either by predefined motion trajectories, which offer limited flexibility when user intent changes, or by continuous human teleoperation, which requires constant human involvement and limits autonomy. This work addresses the problem of how to drive a universal humanoid controller in a real-time and interactive manner. We present TextOp, a real-time text-driven humanoid motion generation and control framework that supports streaming language commands and on-the-fly instruction modification during execution. TextOp adopts a two-level architecture in which a high-level autoregressive motion diffusion model continuously generates short-horizon kinematic trajectories conditioned on the current text input, while a low-level motion tracking policy executes these trajectories on a physical humanoid robot. By bridging interactive motion generation with robust whole-body control, TextOp unlocks free-form intent expression and enables smooth transitions across multiple challenging behaviors such as dancing and jumping, within a single continuous motion execution. Extensive real-robot experiments and offline evaluations demonstrate instant responsiveness, smooth whole-body motion, and precise control. The project page and the open-source code are available at https://text-op.github.io/

1 Citations

0 Influential

4.5 Altmetric

23.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!