2604.08457v1 Apr 09, 2026 cs.CV

CrashSight: 단계별 인식 및 인프라 중심의 교통 사고 장면 이해 및 추론을 위한 비디오 벤치마크

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

Rui Gan

Citations: 98

h-index: 5

B. Ran

Citations: 116

h-index: 5

Sikai Chen

Citations: 533

h-index: 14

Pei Li

Citations: 9

h-index: 2

XingYou Yang

Citations: 27

h-index: 2

Kaiyuan Chen

Citations: 27

h-index: 2

Jun Ma

Citations: 13

h-index: 2

협력 자율 주행은 차량 및 인프라 관점 모두에서 교통 장면 이해를 요구합니다. 기존 벤치마크는 주로 자율 주행 차량에 초점을 맞추고 있어, 시각-언어 모델(VLM)의 일반적인 추론 능력은 뛰어나지만 안전이 중요한 교통 시나리오에서의 성능은 충분히 평가되지 않았습니다. 이러한 격차를 해소하기 위해, 우리는 실제 도로변 카메라 데이터를 활용하여 도로 사고 이해를 위한 대규모 시각-언어 벤치마크인 extbf{CrashSight}를 제시합니다. 이 데이터셋은 250개의 사고 영상을 포함하며, 13,000개의 객관식 질문-답변 쌍으로 구성되어 있으며, 2단계 분류 체계를 따릅니다. 1단계는 장면 맥락 및 관련 당사자의 시각적 연결을 평가하며, 2단계는 사고 메커니즘, 인과 관계, 시간적 진행, 사고 후 결과 등 고차원적인 추론 능력을 검증합니다. 우리는 8개의 최첨단 VLM을 사용하여 성능을 비교한 결과, 뛰어난 장면 묘사 능력을 보여주지만, 안전이 중요한 시나리오에서 시간적 및 인과적 추론에 어려움을 겪는다는 것을 확인했습니다. 우리는 실패 사례에 대한 상세한 분석을 제공하고, VLM의 사고 이해 능력을 향상시킬 수 있는 방안을 논의합니다. 이 벤치마크는 협력 자율 주행에서의 인프라 기반 인지 시스템에 대한 표준화된 평가 프레임워크를 제공합니다. CrashSight 벤치마크 (전체 데이터셋 및 코드 포함)는 다음 링크에서 이용할 수 있습니다: https://mcgrche.github.io/crashsight.

Original Abstract

Cooperative autonomous driving requires traffic scene understanding from both vehicle and infrastructure perspectives. While vision-language models (VLMs) show strong general reasoning capabilities, their performance in safety-critical traffic scenarios remains insufficiently evaluated due to the ego-vehicle focus of existing benchmarks. To bridge this gap, we present \textbf{CrashSight}, a large-scale vision-language benchmark for roadway crash understanding using real-world roadside camera data. The dataset comprises 250 crash videos, annotated with 13K multiple-choice question-answer pairs organized under a two-tier taxonomy. Tier 1 evaluates the visual grounding of scene context and involved parties, while Tier 2 probes higher-level reasoning, including crash mechanics, causal attribution, temporal progression, and post-crash outcomes. We benchmark 8 state-of-the-art VLMs and show that, despite strong scene description capabilities, current models struggle with temporal and causal reasoning in safety-critical scenarios. We provide a detailed analysis of failure scenarios and discuss directions for improving VLM crash understanding. The benchmark provides a standardized evaluation framework for infrastructure-assisted perception in cooperative autonomous driving. The CrashSight benchmark, including the full dataset and code, is accessible at https://mcgrche.github.io/crashsight.

1 Citations

0 Influential

7 Altmetric

36.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!