2602.24195v1 Feb 27, 2026 cs.AI

불일치 조정된 의미 체적을 이용한 다중 모드 대규모 언어 모델의 불확실성 정량화

Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume

Gregory Kang Ruey Lau

Citations: 134

h-index: 6

Hieu Dao

Citations: 39

h-index: 4

Nicole Lin

Citations: 23

h-index: 2

K. H. Low

Citations: 3,867

h-index: 39

다중 모드 대규모 언어 모델(MLLM)은 뛰어난 능력을 가지고 있지만, 때로는 그럴듯하지만 오류가 있는 결과를 생성하여 신뢰성 있는 활용을 어렵게 만듭니다. 정확한 불확실성 지표는 신뢰할 수 없는 쿼리를 인간 전문가나 더 큰 모델로 연결하여 성능을 향상시킬 수 있습니다. 그러나 기존의 불확실성 지표는 특정 모드에만 적용되거나, 외부 도구에 의존하거나, 계산 비용이 많이 드는 등의 실질적인 제약이 있습니다. 본 연구에서는 외부 도구 없이 다양한 입력 및 출력 모드에서 효율적으로 작동하는 MLLM의 불확실성 정량화 프레임워크인 UMPIRE를 제안합니다. UMPIRE는 모델 자체의 내부 모드 특징에만 의존하여, 주어진 작업 인스턴스에 대한 MLLM 응답의 불일치 조정된 의미 체적을 계산함으로써, 샘플의 전반적인 의미 다양성과 응답의 지역적 불일치를 효과적으로 파악합니다. 본 연구에서는 MLLM에 대한 불확실성 요구 사항을 제시하고, UMPIRE의 설계에 대한 이론적 분석을 제공합니다. 광범위한 실험 결과, UMPIRE는 이미지, 오디오, 비디오-텍스트 벤치마크에서, 특히 적대적 및 데이터 분포 외부 환경에서, 기존 지표보다 일관되게 우수한 성능을 보였습니다. 또한 UMPIRE가 이미지 및 오디오 생성과 같은 비텍스트 출력 작업에도 일반화될 수 있음을 보여줍니다.

Original Abstract

Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurate uncertainty metrics could enable escalation of unreliable queries to human experts or larger models for improved performance. However, existing uncertainty metrics have practical constraints, such as being designed only for specific modalities, reliant on external tools, or computationally expensive. We introduce UMPIRE, a training-free uncertainty quantification framework for MLLMs that works efficiently across various input and output modalities without external tools, relying only on the models' own internal modality features. UMPIRE computes the incoherence-adjusted semantic volume of sampled MLLM responses for a given task instance, effectively capturing both the global semantic diversity of samples and the local incoherence of responses based on internal model confidence. We propose uncertainty desiderata for MLLMs and provide theoretical analysis motivating UMPIRE's design. Extensive experiments show that UMPIRE consistently outperforms baseline metrics in error detection and uncertainty calibration across image, audio, and video-text benchmarks, including adversarial and out-of-distribution settings. We also demonstrate UMPIRE's generalization to non-text output tasks, including image and audio generation.

1 Citations

0 Influential

19.5 Altmetric

98.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!