2604.12944v1 Apr 14, 2026 cs.CV

왜곡되거나 조작된 것일까? 비디오 LLM에서의 환각 현상에 대한 연구

Distorted or Fabricated? A Survey on Hallucination in Video LLMs

Yitian Zhang

Citations: 54

h-index: 4

Yizhou Wang

Citations: 538

h-index: 6

Yiyang Huang

Northeastern University

Citations: 23

h-index: 2

Mingyu Zhang

Citations: 11

h-index: 1

Huimin Zeng

Citations: 219

h-index: 7

Yun Fu

Citations: 32

h-index: 3

Liang Shi

Citations: 46

h-index: 3

비디오-언어 모델 분야에서 상당한 발전이 있었음에도 불구하고, 비디오 대규모 언어 모델(Vid-LLM)에서 발생하는 환각 현상은 여전히 중요한 과제입니다. 환각 현상은 입력 비디오의 내용과 모순되지만 그럴듯하게 보이는 출력을 의미합니다. 본 연구는 Vid-LLM에서의 환각 현상에 대한 종합적인 분석을 제시하고, 이를 두 가지 핵심 유형, 즉 동적 왜곡과 내용 조작으로 체계적으로 분류하는 분류 체계를 소개합니다. 각 유형은 대표적인 사례를 포함하는 두 가지 하위 유형으로 구성됩니다. 이 분류 체계를 바탕으로, 본 연구는 환각 현상의 평가 및 완화에 대한 최근의 발전 동향을 검토하며, 주요 벤치마크, 지표 및 개입 전략을 다룹니다. 또한, 동적 왜곡과 내용 조작의 근본적인 원인을 분석하는데, 이는 종종 시간적 표현 능력의 제한과 충분하지 않은 시각적 기반에서 비롯됩니다. 이러한 분석 결과는 향후 연구의 유망한 방향을 제시하며, 여기에는 모션 인지 시각 인코더 개발과 반사실적 학습 기술 통합이 포함됩니다. 본 연구는 Vid-LLM에서의 환각 현상에 대한 체계적인 이해를 촉진하기 위해 산재된 연구 결과를 종합하고, 견고하고 신뢰할 수 있는 비디오-언어 시스템 구축을 위한 기반을 마련합니다. 관련 연구 목록은 https://github.com/hukcc/Awesome-Video-Hallucination 에서 확인할 수 있습니다.

Original Abstract

Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video Large Language Models (Vid-LLMs), referring to outputs that appear plausible yet contradict the content of the input video. This survey presents a comprehensive analysis of hallucinations in Vid-LLMs and introduces a systematic taxonomy that categorizes them into two core types: dynamic distortion and content fabrication, each comprising two subtypes with representative cases. Building on this taxonomy, we review recent advances in the evaluation and mitigation of hallucinations, covering key benchmarks, metrics, and intervention strategies. We further analyze the root causes of dynamic distortion and content fabrication, which often result from limited capacity for temporal representation and insufficient visual grounding. These insights inform several promising directions for future work, including the development of motion-aware visual encoders and the integration of counterfactual learning techniques. This survey consolidates scattered progress to foster a systematic understanding of hallucinations in Vid-LLMs, laying the groundwork for building robust and reliable video-language systems. An up-to-date curated list of related works is maintained at https://github.com/hukcc/Awesome-Video-Hallucination .

1 Citations

0 Influential

41.276740307447 Altmetric

207.4 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!