2603.17662v1 Mar 18, 2026 cs.CV

FINER: 미세한 부정 질의에 대한 응답 시 멀티모달 대규모 언어 모델(MLLM)의 환각 현상

FINER: MLLMs Hallucinate under Fine-grained Negative Queries

Zeynep Akata

Citations: 1,547

h-index: 21

Rui Xiao

Citations: 70

h-index: 3

Sanghwan Kim

Citations: 55

h-index: 2

Yongqin Xian

Citations: 2,948

h-index: 10

S. Alaniz

Citations: 564

h-index: 13

멀티모달 대규모 언어 모델(MLLM)은 환각 현상, 특히 세부적인 질의에 대한 응답에서 어려움을 겪습니다. 이러한 문제는 기존 벤치마크가 거친 수준의 이미지 관련 질문에 초점을 맞추고 있어 제대로 반영되지 못하고 있습니다. 본 연구에서는 세부적인 부정 질의(FINER)와 함께 두 가지 벤치마크(FINER-CompreCap 및 FINER-DOCCI)를 소개합니다. FINER를 사용하여 다중 객체, 다중 속성, 다중 관계, 그리고 "무엇"에 대한 질문을 포함한 네 가지 환경에서 환각 현상을 분석했습니다. 벤치마크 결과, MLLM은 세부적인 불일치가 이미지 내 실제로 존재하는 요소와 함께 발생할 때 환각을 일으키는 것으로 나타났습니다. 이를 해결하기 위해, FINER에서 영감을 얻은 데이터를 활용한 직접 선호 최적화(DPO) 기반의 FINER-Tuning 방법을 제안합니다. FINER-Tuning을 통해 네 가지 최첨단 MLLM을 미세 조정했을 때, 벤치마크에서 환각 현상에 대한 최대 24.2%의 성능 향상을 얻었으며, 동시에 기존의 여덟 가지 환각 현상 평가 도구에 대한 성능을 개선하고, 여섯 가지 벤치마크에서 전반적인 멀티모달 능력을 향상시켰습니다. 코드, 벤치마크, 모델은 다음 링크에서 확인할 수 있습니다: https://explainableml.github.io/finer-project/.

Original Abstract

Multimodal large language models (MLLMs) struggle with hallucinations, particularly with fine-grained queries, a challenge underrepresented by existing benchmarks that focus on coarse image-related questions. We introduce FIne-grained NEgative queRies (FINER), alongside two benchmarks: FINER-CompreCap and FINER-DOCCI. Using FINER, we analyze hallucinations across four settings: multi-object, multi-attribute, multi-relation, and ``what'' questions. Our benchmarks reveal that MLLMs hallucinate when fine-grained mismatches co-occur with genuinely present elements in the image. To address this, we propose FINER-Tuning, leveraging Direct Preference Optimization (DPO) on FINER-inspired data. Finetuning four frontier MLLMs with FINER-Tuning yields up to 24.2\% gains (InternVL3.5-14B) on hallucinations from our benchmarks, while simultaneously improving performance on eight existing hallucination suites, and enhancing general multimodal capabilities across six benchmarks. Code, benchmark, and models are available at \href{https://explainableml.github.io/finer-project/}{https://explainableml.github.io/finer-project/}.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!