2602.04215v2 Feb 04, 2026 cs.RO

OAT: 정렬된 액션 토큰화

OAT: Ordered Action Tokenization

Chaoqi Liu

Citations: 41

h-index: 3

Xiaoshen Han

Citations: 53

h-index: 3

Jiawei Gao

Citations: 178

h-index: 2

Yue Zhao

Citations: 100

h-index: 4

Haonan Chen

Citations: 28

h-index: 1

Yilun Du

Citations: 5

h-index: 1

자기 회귀 정책은 이산적인 추상화, 토큰 수준의 추론, 그리고 유연한 추론을 가능하게 하여 확장 가능한 로봇 학습의 강력한 기반을 제공합니다. 그러나 연속적인 로봇 액션에 자기 회귀 모델링을 적용하려면 효과적인 액션 토큰화 방식이 필요합니다. 기존 방법들은 분석적인 이산화 방법을 사용하여 지나치게 긴 토큰 시퀀스를 생성하거나, 구조가 부족하여 다음 토큰 예측과의 호환성을 제한하는 학습 기반 잠재 토크나이저를 사용합니다. 본 연구에서는 액션 토큰화에 필요한 세 가지 중요한 요소 – 높은 압축률, 완전한 디코딩 가능성, 그리고 왼쪽에서 오른쪽으로 정렬된 인과적 토큰 공간 –를 제시하고, 이 세 가지 요소를 모두 만족하는 학습 기반 액션 토크나이저인 Ordered Action Tokenization (OAT)를 소개합니다. OAT는 트랜스포머, 제한된 스칼라 양자화, 그리고 정렬을 유도하는 학습 메커니즘을 사용하여 액션 덩어리를 정렬된 토큰 시퀀스로 이산화합니다. 이렇게 생성된 토큰 공간은 자기 회귀 생성을 자연스럽게 지원하며, 접두사 기반의 디토큰화를 가능하게 하여 추론 비용과 액션 충실도 사이의 균형을 언제든지 조절할 수 있습니다. 4가지 시뮬레이션 벤치마크와 실제 환경에서 20개 이상의 작업에서, OAT를 사용한 자기 회귀 정책은 기존 토큰화 방식 및 확산 기반 모델보다 일관되게 우수한 성능을 보이며, 추론 시에 훨씬 더 큰 유연성을 제공합니다.

Original Abstract

Autoregressive policies offer a compelling foundation for scalable robot learning by enabling discrete abstraction, token-level reasoning, and flexible inference. However, applying autoregressive modeling to continuous robot actions requires an effective action tokenization scheme. Existing approaches either rely on analytical discretization methods that produce prohibitively long token sequences, or learned latent tokenizers that lack structure, limiting their compatibility with next-token prediction. In this work, we identify three desiderata for action tokenization - high compression, total decodability, and a left-to-right causally ordered token space - and introduce Ordered Action Tokenization (OAT), a learned action tokenizer that satisfies all three. OAT discretizes action chunks into an ordered sequence of tokens using transformer with registers, finite scalar quantization, and ordering-inducing training mechanisms. The resulting token space aligns naturally with autoregressive generation and enables prefix-based detokenization, yielding an anytime trade-off between inference cost and action fidelity. Across more than 20 tasks spanning four simulation benchmarks and real-world settings, autoregressive policies equipped with OAT consistently outperform prior tokenization schemes and diffusion-based baselines, while offering significantly greater flexibility at inference time.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!