2603.23934v1 Mar 25, 2026 cs.CV

대규모 시각-언어 모델에서 나타나는 다중 시점 환각 현상 분석

Revealing Multi-View Hallucination in Large Vision-Language Models

Kyuhong Shim

Citations: 479

h-index: 10

B. Shim

Citations: 8,362

h-index: 46

Woo-Soon Park

Citations: 16

h-index: 3

Insu Lee

Citations: 10

h-index: 2

Soohyun Kim

Citations: 146

h-index: 6

J. Jang

Citations: 8

h-index: 2

Minyoung Noh

Citations: 5

h-index: 1

대규모 시각-언어 모델(LVLM)은 다양한 시점에서 획득된 다중 시점 이미지 입력에 대해 점점 더 많이 활용되고 있습니다. 그러나 이러한 활용이 증가함에도 불구하고, 현재의 LVLM은 종종 서로 다른 객체 또는 시점에서 비롯된 시각 정보를 혼동하거나 일치시키지 못하는 경향이 있는데, 이를 우리는 '다중 시점 환각(multi-view hallucination)'이라고 부릅니다. 이 문제를 체계적으로 분석하기 위해, 우리는 두 가지 유형의 환각(객체 간 환각 및 시점 간 환각)을 목표로 하는 4,800개의 질문-답변 쌍으로 구성된 벤치마크인 MVH-Bench를 구축했습니다. 실험 결과, 최근의 LVLM은 시각적 증거를 해당 객체 또는 시점과 정확하게 연결하는 데 어려움을 겪는 것으로 나타났습니다. 이러한 한계를 극복하기 위해, 우리는 어텐션 마스킹을 통해 시각적 간섭을 억제하는 학습이 필요 없는 디코딩 기법인 Reference Shift Contrastive Decoding (RSCD)을 제안합니다. Qwen2.5-VL 및 LLaVA-OneVision을 사용하여 MVH-Bench에서 RSCD를 실험한 결과, RSCD는 기존의 환각 완화 방법보다 최대 21.1% 및 34.6%의 성능 향상을 보였으며, 이는 제안하는 방법의 효과를 입증합니다.

Original Abstract

Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from diverse viewpoints. However, despite this growing use, current LVLMs often confuse or mismatch visual information originating from different instances or viewpoints, a phenomenon we term multi-view hallucination. To systematically analyze this problem, we construct MVH-Bench, a benchmark comprising 4.8k question-answer pairs targeting two types of hallucination: cross-instance and cross-view. Empirical results show that recent LVLMs struggle to correctly associate visual evidence with its corresponding instance or viewpoint. To overcome this limitation, we propose Reference Shift Contrastive Decoding (RSCD), a training-free decoding technique that suppresses visual interference by generating negative logits through attention masking. Experiments on MVH-Bench with Qwen2.5-VL and LLaVA-OneVision demonstrate that RSCD consistently improves performance by up to 21.1 and 34.6 points over existing hallucination mitigation methods, highlighting the effectiveness of our approach.

0 Citations

0 Influential

23 Altmetric

115.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!