2603.00694v1 Feb 28, 2026 cs.RO

Wild-Drive: 강력한 다중 모드 라우팅 및 효율적인 대규모 언어 모델을 활용한 오프로드 장면 설명 및 경로 계획

Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model

Zihan Wang

Citations: 65

h-index: 4

Xu Li

Citations: 29

h-index: 3

Benwu Wang

Citations: 12

h-index: 2

Xieyuanli Chen

Citations: 76

h-index: 5

Dong Kong

Citations: 8

h-index: 2

Kailin Lyu

Citations: 18

h-index: 2

Yinan Du

Citations: 24

h-index: 3

Yiming Peng

Citations: 2

h-index: 1

Haoyang Che

Citations: 2

h-index: 1

Wenkai Zhu

Citations: 13

h-index: 2

자율 주행 시스템의 안전한 배치를 위해서는 설명 가능성과 투명한 의사 결정이 필수적입니다. 장면 설명은 환경 조건과 위험 요소를 자연어로 요약하여 투명성, 안전성 및 인간-로봇 상호 작용을 향상시킵니다. 그러나 대부분의 기존 방법은 구조화된 도시 환경을 대상으로 하며, 오프로드 환경에서는 비, 안개, 눈, 어둠과 같은 단일 모드 저하에 취약하며, 구조화된 장면 설명과 경로 계획을 통합적으로 모델링하는 단일 프레임워크가 부족합니다. 이러한 격차를 해소하기 위해, 오프로드 장면 설명 및 경로 계획을 위한 효율적인 프레임워크인 Wild-Drive를 제안합니다. Wild-Drive는 최신 다중 모드 인코더를 채택하고, 성능 저하 시 안정적인 정보를 적응적으로 통합하기 위한 태스크 조건부 모드 라우팅 브리지인 MoRo-Former를 도입합니다. 또한, 효율적인 대규모 언어 모델(LLM), 계획 토큰 및 게이트 순환 유닛(GRU) 디코더를 통합하여 구조화된 설명을 생성하고 미래의 궤적을 예측합니다. 또한, 다양한 센서 오류 조건에서 구조화된 오프로드 장면 설명 및 경로 계획을 다루는 OR-C2P 벤치마크를 구축했습니다. OR-C2P 데이터셋과 자체 수집 데이터셋에 대한 실험 결과, Wild-Drive는 기존의 LLM 기반 방법보다 우수한 성능을 보이며, 성능 저하된 환경에서도 더 안정적인 성능을 유지하는 것을 확인했습니다. 코드 및 벤치마크는 https://github.com/wangzihanggg/Wild-Drive 에서 공개적으로 제공됩니다.

Original Abstract

Explainability and transparent decision-making are essential for the safe deployment of autonomous driving systems. Scene captioning summarizes environmental conditions and risk factors in natural language, improving transparency, safety, and human--robot interaction. However, most existing approaches target structured urban scenarios; in off-road environments, they are vulnerable to single-modality degradations caused by rain, fog, snow, and darkness, and they lack a unified framework that jointly models structured scene captioning and path planning. To bridge this gap, we propose Wild-Drive, an efficient framework for off-road scene captioning and path planning. Wild-Drive adopts modern multimodal encoders and introduces a task-conditioned modality-routing bridge, MoRo-Former, to adaptively aggregate reliable information under degraded sensing. It then integrates an efficient large language model (LLM), together with a planning token and a gate recurrent unit (GRU) decoder, to generate structured captions and predict future trajectories. We also build the OR-C2P Benchmark, which covers structured off-road scene captioning and path planning under diverse sensor corruption conditions. Experiments on OR-C2P dataset and a self-collected dataset show that Wild-Drive outperforms prior LLM-based methods and remains more stable under degraded sensing. The code and benchmark will be publicly available at https://github.com/wangzihanggg/Wild-Drive.

0 Citations

0 Influential

32.229550745277 Altmetric

161.1 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!