2602.21172v1 Feb 24, 2026 cs.AI

NoRD: 추론 없이도 작동하는 데이터 효율적인 비전-언어-액션 모델

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

I. Rawal

Citations: 42

h-index: 3

Wei Zhan

Citations: 8,879

h-index: 42

Shubh Gupta

Citations: 9

h-index: 1

Yihan Hu

Citations: 24

h-index: 3

비전-언어-액션(VLA) 모델은 모듈화된 파이프라인을 통합된 엔드-투-엔드 아키텍처로 대체하며 자율 주행 기술을 발전시키고 있습니다. 그러나 현재 VLA 모델들은 (1) 막대한 데이터 수집 및 (2) 정교한 추론 데이터 어노테이션이라는 두 가지 높은 요구 사항에 직면하고 있습니다. 본 연구에서는 NoRD (No Reasoning for Driving)라는 모델을 통해 이러한 과제를 해결합니다. 기존 VLA 모델과 비교했을 때, NoRD는 데이터의 60% 미만을 사용하고 추론 어노테이션 없이도 경쟁력 있는 성능을 달성하며, 토큰 사용량을 3배 줄입니다. 표준 Group Relative Policy Optimization (GRPO) 알고리즘이 이러한 작은 데이터셋과 추론 데이터가 없는 학습된 정책에 적용될 때 상당한 성능 향상을 가져오지 못한다는 것을 확인했습니다. 이러한 제한성은 GRPO 내에서 발생하는 높은 분산의 시뮬레이션 결과에 대한 보상 신호를 과도하게 벌점 부여하는 '어려움 편향(difficulty bias)' 때문입니다. NoRD는 최근 LLM에서 어려움 편향을 완화하도록 설계된 Dr. GRPO 알고리즘을 통합하여 이러한 문제를 극복합니다. 그 결과, NoRD는 Waymo 및 NAVSIM 데이터셋에서 훈련 데이터 양을 줄이고 추론 과정의 부담을 없앤 상태에서도 경쟁력 있는 성능을 달성하여, 보다 효율적인 자율 시스템을 가능하게 합니다.

Original Abstract

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.

7 Citations

0 Influential

21 Altmetric

112.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!