2603.04868v1 Mar 05, 2026 cs.AI

K-Gen: 해석 가능한 키포인트 기반 경로 생성에 대한 다중 모드 언어 조건부 접근 방식

K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation

Ping Wu

Citations: 10

h-index: 2

Mingxuan Mu

Citations: 0

h-index: 0

Guo Yang

Citations: 2

h-index: 1

Lei Chen

Citations: 34

h-index: 2

Jianxun Cui

Citations: 2

h-index: 1

현실적이고 다양한 경로를 생성하는 것은 자율 주행 시뮬레이션의 중요한 과제입니다. 대규모 언어 모델(LLM)은 가능성을 보여주지만, 기존 방법은 벡터화된 지도와 같이 구조화된 데이터에 의존하는 경향이 있으며, 이는 장면의 풍부하고 비정형적인 시각적 맥락을 포착하지 못합니다. 이를 해결하기 위해, 우리는 다중 모드 대규모 언어 모델(MLLM)을 활용하여 래스터화된 Bird's Eye View(BEV) 지도 입력과 텍스트 기반 장면 설명을 통합하는 해석 가능한 키포인트 기반 다중 모드 프레임워크인 K-Gen을 제안합니다. K-Gen은 전체 경로를 직접 예측하는 대신, 해석 가능한 키포인트를 생성하고, 이는 에이전트의 의도를 반영하는 추론과 함께 생성되며, 이후 정제 모듈에 의해 정확한 경로로 개선됩니다. 또한, 키포인트 생성을 더욱 향상시키기 위해, 경로 인지 강화 학습 미세 조정 알고리즘인 T-DAPO를 적용합니다. WOMD 및 nuPlan 데이터셋에 대한 실험 결과, K-Gen은 기존의 방법보다 우수한 성능을 보여주며, 다중 모드 추론과 키포인트 기반 경로 생성을 결합하는 것이 효과적임을 입증합니다.

Original Abstract

Generating realistic and diverse trajectories is a critical challenge in autonomous driving simulation. While Large Language Models (LLMs) show promise, existing methods often rely on structured data like vectorized maps, which fail to capture the rich, unstructured visual context of a scene. To address this, we propose K-Gen, an interpretable keypoint-guided multimodal framework that leverages Multimodal Large Language Models (MLLMs) to unify rasterized BEV map inputs with textual scene descriptions. Instead of directly predicting full trajectories, K-Gen generates interpretable keypoints along with reasoning that reflects agent intentions, which are subsequently refined into accurate trajectories by a refinement module. To further enhance keypoint generation, we apply T-DAPO, a trajectory-aware reinforcement fine-tuning algorithm. Experiments on WOMD and nuPlan demonstrate that K-Gen outperforms existing baselines, highlighting the effectiveness of combining multimodal reasoning with keypoint-guided trajectory generation.

0 Citations

0 Influential

1 Altmetric

5.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!