2603.21029v1 Mar 22, 2026 cs.AI

KLDrive: 지식 그래프 기반 자율 주행을 위한 세밀한 3차원 장면 추론

KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph

Zihao Wang

Citations: 12

h-index: 1

Ye Tian

Citations: 14

h-index: 2

Tajana Simunic

Citations: 117

h-index: 6

Jingyi Zhang

Citations: 683

h-index: 11

Xiaoyuan Ren

Citations: 11

h-index: 1

Xiaofan Yu

Citations: 49

h-index: 5

Onat Güngör

Citations: 353

h-index: 9

자율 주행은 세밀한 3차원 장면 정보를 기반으로 한 신뢰성 있는 추론을 요구합니다. 다양한 감각 데이터를 활용한 세밀한 질문 답변은 이러한 능력을 평가하는 자연스러운 방법이지만, 기존의 인지 파이프라인 및 자율 주행 특화 대규모 언어 모델(LLM) 방법은 여전히 신뢰할 수 없는 장면 정보, 환각 현상, 불투명한 추론 과정, 그리고 과도한 작업 특화 훈련 의존성 등의 문제를 안고 있습니다. 본 논문에서는 자율 주행 분야의 세밀한 질문 답변을 위한 최초의 지식 그래프 기반 LLM 추론 프레임워크인 KLDrive를 제안합니다. KLDrive는 두 가지 핵심 구성 요소를 통해 이러한 문제를 해결합니다. 첫째, 다양한 소스에서 얻은 증거를 통합하여 신뢰성 있는 장면 지식 그래프를 구축하는 에너지 기반 장면 사실 생성 모듈입니다. 둘째, 명시적인 구조적 제약 하에서 제한된 행동 공간 내에서 사실에 기반한 추론을 수행하는 LLM 에이전트입니다. 본 프레임워크는 구조화된 프롬프트와 소량의 예시를 결합하여, 과도한 작업 특화 미세 조정 없이 다양한 추론 작업에 적용될 수 있도록 설계되었습니다. 두 가지 대규모 자율 주행 질문 답변 벤치마크에서 수행한 실험 결과, KLDrive는 기존 최고 성능 모델보다 우수한 성능을 보였으며, NuScenes-QA에서 전체 정확도 65.04%를, GVQA에서 SPICE 점수 42.45를 달성했습니다. 특히 가장 어려운 사실 기반 추론 작업인 '개수 세기'에서 가장 강력한 기준 모델보다 46.01%p 향상된 성능을 보였으며, 이는 환각 현상 감소 및 신뢰성 있는 장면 정보 구축과 명시적인 추론의 결합이 효과적임을 입증합니다.

Original Abstract

Autonomous driving requires reliable reasoning over fine-grained 3D scene facts. Fine-grained question answering over multi-modal driving observations provides a natural way to evaluate this capability, yet existing perception pipelines and driving-oriented large language model (LLM) methods still suffer from unreliable scene facts, hallucinations, opaque reasoning, and heavy reliance on task-specific training. We present KLDrive, the first knowledge-graph-augmented LLM reasoning framework for fine-grained question answering in autonomous driving. KLDrive addresses this problem through designing two tightly coupled components: an energy-based scene fact construction module that consolidates multi-source evidence into a reliable scene knowledge graph, and an LLM agent that performs fact-grounded reasoning over a constrained action space under explicit structural constraints. By combining structured prompting with few-shot in-context exemplars, the framework adapts to diverse reasoning tasks without heavy task-specific fine-tuning. Experiments on two large-scale autonomous-driving QA benchmarks show that KLDrive outperforms prior state-of-the-art methods, achieving the best overall accuracy of 65.04% on NuScenes-QA and the best SPICE score of 42.45 on GVQA. On counting, the most challenging factual reasoning task, it improves over the strongest baseline by 46.01 percentage points, demonstrating substantially reduced hallucinations and the benefit of coupling reliable scene fact construction with explicit reasoning.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!