2603.07264v1 Mar 07, 2026 cs.RO

데이터 효율적인 자율 주행을 위한 운동학 기반 잠재 세계 모델

Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

Qi Liu

Citations: 186

h-index: 7

Jiazhuo Li

Citations: 4

h-index: 1

Linjiang Cao

Citations: 0

h-index: 0

Xi Xiong

Citations: 82

h-index: 4

데이터 효율적인 학습은 대규모 실제 환경과의 상호 작용에 따르는 높은 비용과 안전상의 위험으로 인해 자율 주행 분야에서 여전히 중요한 과제입니다. 세계 모델 기반 강화 학습은 잠재적인 상상력을 통해 정책 최적화를 가능하게 하지만, 기존 접근 방식은 종종 자율 주행 작업에 필수적인 공간 및 운동학적 구조를 명시적으로 인코딩하는 메커니즘이 부족합니다. 본 연구에서는 순환 상태 공간 모델(RSSM)을 기반으로 자율 주행을 위한 운동학 기반 잠재 세계 모델 프레임워크를 제안합니다. 차량의 운동학 정보를 관측 인코더에 통합하여 잠재적 상태 변화를 물리적으로 의미 있는 운동 역학으로 연결하고, 기하학적 정보에 대한 감독 학습을 통해 RSSM 잠재 상태를 정규화하여 픽셀 재구성 이상의 작업 관련 공간 구조를 캡처합니다. 결과적으로 생성된 구조화된 잠재 역학은 장기적인 상상력의 정확도를 향상시키고 정책 최적화를 안정화합니다. 자율 주행 시뮬레이션 벤치마크에서의 실험 결과는 데이터 효율성과 주행 성능 측면에서 모델 기반 및 픽셀 기반 세계 모델 기준보다 일관된 성능 향상을 보여줍니다. 추가 분석을 통해 제안된 설계가 잠재 공간 내의 공간 표현 품질을 향상시키는 것으로 확인되었습니다. 이러한 결과는 RSSM 기반 세계 모델에 운동학적 정보를 통합하는 것이 자율 주행 정책 학습을 위한 확장 가능하고 물리적으로 타당한 패러다임을 제공한다는 것을 시사합니다.

Original Abstract

Data-efficient learning remains a central challenge in autonomous driving due to the high cost and safety risks of large-scale real-world interaction. Although world-model-based reinforcement learning enables policy optimization through latent imagination, existing approaches often lack explicit mechanisms to encode spatial and kinematic structure essential for driving tasks. In this work, we build upon the Recurrent State-Space Model (RSSM) and propose a kinematics-aware latent world model framework for autonomous driving. Vehicle kinematic information is incorporated into the observation encoder to ground latent transitions in physically meaningful motion dynamics, while geometry-aware supervision regularizes the RSSM latent state to capture task-relevant spatial structure beyond pixel reconstruction. The resulting structured latent dynamics improve long-horizon imagination fidelity and stabilize policy optimization. Experiments in a driving simulation benchmark demonstrate consistent gains over both model-free and pixel-based world-model baselines in terms of sample efficiency and driving performance. Ablation studies further verify that the proposed design enhances spatial representation quality within the latent space. These results suggest that integrating kinematic grounding into RSSM-based world models provides a scalable and physically grounded paradigm for autonomous driving policy learning.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!