2605.03269v1 May 05, 2026 cs.RO

RLDX-1 기술 보고서

RLDX-1 Technical Report

Suhyeok Jang

Citations: 12

h-index: 2

Dongyoung Kim

Citations: 112

h-index: 7

John Won

Citations: 21

h-index: 2

Jinwoo Shin

Citations: 48

h-index: 3

Kangwook Lee

Citations: 150

h-index: 4

Taeyoung Kim

Citations: 32

h-index: 2

Huiwon Jang

Citations: 700

h-index: 8

Dohyeong Kim

Citations: 6

h-index: 2

Seonil Son

Citations: 110

h-index: 4

K. Choe

Citations: 27

h-index: 2

Myungkyu Koo

Citations: 33

h-index: 2

Beomjun Kim

Citations: 42

h-index: 3

Byung-Jun Yoon

Citations: 4

h-index: 1

C. Jang

Citations: 2

h-index: 1

Daewon Choi

Citations: 53

h-index: 3

Dongsu Han

Citations: 56

h-index: 3

Donguk Lee

Citations: 84

h-index: 5

H. Kwon

Citations: 41

h-index: 4

Hojin Jeon

Citations: 33

h-index: 2

Jaehyun Kang

Citations: 48

h-index: 3

Joonwoo Ahn

Citations: 267

h-index: 6

Junhyeon Park

Citations: 8

h-index: 2

Junyoung Sung

Citations: 3

h-index: 1

Kyungmin Lee

Citations: 317

h-index: 9

MinSung Yoon

Citations: 11

h-index: 2

S. Joo

Citations: 5

h-index: 2

Seungcheol Park

Citations: 3

h-index: 1

Seung-Mo Cho

Citations: 17

h-index: 2

Seungjun Moon

Citations: 217

h-index: 7

Yong Dong

Citations: 7

h-index: 2

Yongjin Cho

Citations: 94

h-index: 3

Youngchan Kim

Citations: 2

h-index: 1

H. Ahn

Citations: 2

h-index: 1

H. Ryu

Citations: 19

h-index: 2

Jo-Ping Chang

Citations: 2

h-index: 1

J. Park

Citations: 35

h-index: 2

Jungwoo Park

Citations: 6

h-index: 2

J. Cho

Citations: 2

h-index: 1

Junhyeok Park

Citations: 49

h-index: 2

Manoj Bhadu

Citations: 11

h-index: 2

Nayoung Oh

Citations: 30

h-index: 4

Sangjun Kim

Citations: 83

h-index: 4

Sangwoo Kim

Citations: 5

h-index: 2

Seung-tae Shim

Citations: 5

h-index: 2

Seungjun Lee

Citations: 26

h-index: 2

Seungyup Ka

Citations: 4

h-index: 2

Sung-Po Yang

Citations: 3

h-index: 1

W. Jung

Citations: 42

h-index: 2

Yash Shukla

Citations: 38

h-index: 4

Y. Bae

Citations: 27

h-index: 2

Jae-sung Bae

Citations: 4,280

h-index: 39

Jihyuk Lee

Citations: 21

h-index: 2

Jimin Lee

Citations: 27

h-index: 2

Min-Jun Han

Citations: 13

h-index: 2

S. Kim

Citations: 129

h-index: 7

Chang Hwan Kim

Citations: 42

h-index: 3

Haze Lee

Citations: 18

h-index: 2

Heecheol Kim

Citations: 306

h-index: 8

H. Choi

Citations: 2

h-index: 1

Hyunsoo Shin

Citations: 2

h-index: 1

Jaeheon Jung

Citations: 4

h-index: 1

Jaewoo Kim

Citations: 11

h-index: 2

Jinwook Kim

Citations: 236

h-index: 10

Joonsoon Kim

Citations: 2

h-index: 1

Junwon Lee

Citations: 86

h-index: 4

Kwang-Hoe Kim

Citations: 3

h-index: 1

Seung-Wook Kim

Citations: 11

h-index: 2

Yeonjae A. Lee

Citations: 5

h-index: 2

비전-언어-액션(VLA) 모델은 사전 학습된 비전-언어 모델에서 상속된 다재다능한 지능(즉, 폭넓은 장면 이해 및 언어 기반 일반화)을 통해 인간과 유사한 범용 로봇 제어 정책 개발에 상당한 진전을 보였습니다. 그러나 이러한 모델은 여전히 더 넓은 기능적 능력을 요구하는 복잡한 실제 작업(예: 운동 인식, 기억 기반 의사 결정 및 물리적 센싱)에 어려움을 겪습니다. 이러한 문제를 해결하기 위해, 우리는 다중 스트림 액션 트랜스포머(MSAT)를 기반으로 구축된 범용 로봇 제어 정책인 RLDX-1을 소개합니다. MSAT는 모달리티별 스트림과 크로스 모달 공동 자기 주의를 통해 다양한 모달리티를 통합하여 이러한 기능을 통합하는 아키텍처입니다. RLDX-1은 또한 희귀한 조작 시나리오를 위한 학습 데이터 합성, 인간과 유사한 조작에 특화된 학습 절차, 실시간 배포를 위한 추론 최적화와 같은 시스템 수준 설계 선택과 함께 이 아키텍처를 결합합니다. 경험적 평가를 통해, RLDX-1이 시뮬레이션 벤치마크 및 일반적인 다재다능성을 넘어 광범위한 기능적 능력이 필요한 실제 작업에서 최근의 선도적인 VLA 모델(예: $π_{0.5}$ 및 GR00T N1.6)보다 일관되게 우수한 성능을 발휘한다는 것을 보여줍니다. 특히, RLDX-1은 ALLEX 휴머노이드 작업에서 86.8%의 성공률을 달성하여 $π_{0.5}$ 및 GR00T N1.6의 약 40%에 비해 우수한 성능을 보이며, 이는 RLDX-1이 다양한 기능적 요구 사항 하에서 고 자유도 휴머노이드 로봇을 제어할 수 있는 능력을 강조합니다. 이러한 결과는 RLDX-1을 복잡하고 접촉이 많으며 동적인 실제 환경에서 정교한 조작을 위한 신뢰할 수 있는 VLA 개발을 위한 유망한 단계로 자리매김합니다.

Original Abstract

While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, memory-aware decision making, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. $π_{0.5}$ and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while $π_{0.5}$ and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.

2 Citations

0 Influential

19.5 Altmetric

99.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!