2604.27472v1 Apr 30, 2026 cs.AI

PRTS: 대비 표현을 활용한 기본적인 추론 및 작업 시스템

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Xuaner Wu

Citations: 2

h-index: 1

Qizhen Weng

Citations: 10

h-index: 2

Weinan Zhang

Citations: 146

h-index: 5

Xuelong Li

Citations: 282

h-index: 9

Fangzheng Yan

Citations: 411

h-index: 5

Sen Fu

Citations: 0

h-index: 0

Chenjia Bai

Citations: 134

h-index: 7

Yang Zhang

Citations: 126

h-index: 6

Jian Zhao

Citations: 3

h-index: 1

Chenyou Fan

Citations: 29

h-index: 3

Tian Li

Citations: 3

h-index: 1

Hai-lin Tang

Citations: 0

h-index: 0

Xiu Li

Citations: 122

h-index: 5

Chi Zhang

Citations: 17

h-index: 2

Vision-Language-Action (VLA) 모델은 강력한 시각-언어적 사전 지식을 통해 로봇 제어를 발전시킵니다. 그러나 기존의 VLA 모델은 대부분 사전 학습을 지도적인 행동 복제 방식으로 진행하며, 로봇 학습의 근본적인 특성인 목표 달성 과정을 간과합니다. 본 논문에서는 extbf{PRTS} ( extbf{P}rimitive extbf{R}easoning and extbf{T}asking extbf{S}ystem, 기본적인 추론 및 작업 시스템)이라는 VLA 기반 모델을 제시하며, 이를 목표 기반 강화 학습을 통해 재구성합니다. PRTS는 언어 지시 사항을 목표로 간주하고, 대비 강화 학습을 사용하여 상태-행동과 목표 임베딩의 내적곱이 현재 상태-행동으로부터 언어로 지정된 목표에 도달할 확률인 할인된 목표 점유율을 근사하도록 학습합니다. 이를 통해 정적 의미 매칭을 넘어 물리적 실현 가능성을 정량적으로 평가합니다. PRTS는 보상 주석 없이 오프라인 데이터에서 직접 얻은 이러한 목표 달성 가능성 정보를 활용하여, 역할 기반 인과적 마스크를 통해 VLM 백본에 통합하며, 이는 기존의 행동 복제 방식에 비해 미미한 오버헤드를 발생시킵니다. 이러한 방식은 고수준 추론 시스템에 목표 달성 가능성에 대한 내재적 인식을 부여하여, 의미 추론과 시간적 작업 진행을 연결하고, 목표 기반 행동 예측에 더욱 효과적입니다. 다양한 조작 및 인체 기반 추론 데이터 1670억 토큰으로 사전 학습된 PRTS는 LIBERO, LIBERO-Pro, LIBERO-Plus, SimplerEnv 및 14가지 복잡한 실제 작업 환경에서 최첨단 성능을 달성했으며, 특히 장기적인 관점, 접촉이 많은 환경, 그리고 새로운 지시에 대한 제로샷 설정에서 상당한 성능 향상을 보였습니다. 이는 목표 달성 가능성에 대한 인식을 부여함으로써, 일반적인 로봇 기반 정책의 실행 성공률과 장기 계획 능력을 크게 향상시킨다는 것을 확인시켜줍니다.

Original Abstract

Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress. We present \textbf{PRTS} (\textbf{P}rimitive \textbf{R}easoning and \textbf{T}asking \textbf{S}ystem), a VLA foundation model that reformulates pretraining through Goal-Conditioned Reinforcement Learning. By treating language instructions as goals and employing contrastive reinforcement learning, PRTS learns a unified embedding space where the inner product of state-action and goal embeddings approximates the log-discounted goal occupancy, the probability of reaching the language-specified goal from the current state-action, quantitatively assessing physical feasibility beyond static semantic matching. PRTS draws this dense goal-reachability supervision directly from offline trajectories without reward annotations, and folds it into the VLM backbone via a role-aware causal mask, incurring negligible overhead over vanilla behavior cloning. This paradigm endows the high-level reasoning system with intrinsic goal reachability awareness, bridging semantic reasoning and temporal task progress, and further benefits goal-conditioned action prediction. Pretrained on 167B tokens of diverse manipulation and embodied-reasoning data, PRTS reaches state-of-the-art performance on LIBERO, LIBERO-Pro, LIBERO-Plus, SimplerEnv, and a real-world suite of 14 complex tasks, with particularly substantial gains on long-horizon, contact-rich, and zero-shot novel-instruction settings, confirming that injecting goal-reachability awareness significantly improves both execution success and long-horizon planning of general-purpose robotic foundation policies.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!