2604.00445v1 Apr 01, 2026 cs.AI

대규모 언어 모델에서 신뢰성 있는 진실 정렬 불확실성 추정을 향하여

Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models

A. Luu

Citations: 5,973

h-index: 34

Ponhvoan Srey

Citations: 4

h-index: 1

Quang Minh Nguyen

Citations: 38

h-index: 3

Xiaobao Wu

Shanghai Jiao Tong University

Citations: 1,110

h-index: 18

불확실성 추정(UE)은 대규모 언어 모델(LLM)의 환각 현상을 감지하여 신뢰성을 향상시키는 것을 목표로 합니다. 그러나 UE 지표는 종종 설정에 따라 불안정한 성능을 보이며, 이는 그 적용 가능성을 크게 제한합니다. 본 연구에서는 이러한 현상을 '대리 실패(proxy failure)'로 규정합니다. 이는 대부분의 UE 지표가 모델의 동작에서 비롯되며, LLM 출력의 사실적 정확성에 명시적으로 기반하지 않기 때문입니다. 우리는 UE 지표가 정보가 부족한 상황에서 비판적인 구분이 어려워짐을 보여줍니다. 이를 해결하기 위해, 원시 점수를 진실과 일치하는 점수로 매핑하는 사후 보정 방법인 Truth AnChoring (TAC)을 제안합니다. 우리의 TAC는 노이즈가 많고 데이터가 부족한 환경에서도 잘 보정된 불확실성 추정을 학습할 수 있도록 지원하며, 실용적인 보정 프로토콜을 제시합니다. 본 연구의 결과는 휴리스틱 UE 지표를 진실 불확실성의 직접적인 지표로 간주하는 것의 한계를 강조하며, 우리의 TAC가 LLM에 대한 보다 신뢰성 있는 불확실성 추정을 위한 필수적인 단계임을 보여줍니다. 코드 저장소는 https://github.com/ponhvoan/TruthAnchor/ 에서 확인할 수 있습니다.

Original Abstract

Uncertainty estimation (UE) aims to detect hallucinated outputs of large language models (LLMs) to improve their reliability. However, UE metrics often exhibit unstable performance across configurations, which significantly limits their applicability. In this work, we formalise this phenomenon as proxy failure, since most UE metrics originate from model behaviour, rather than being explicitly grounded in the factual correctness of LLM outputs. With this, we show that UE metrics become non-discriminative precisely in low-information regimes. To alleviate this, we propose Truth AnChoring (TAC), a post-hoc calibration method to remedy UE metrics, by mapping the raw scores to truth-aligned scores. Even with noisy and few-shot supervision, our TAC can support the learning of well-calibrated uncertainty estimates, and presents a practical calibration protocol. Our findings highlight the limitations of treating heuristic UE metrics as direct indicators of truth uncertainty, and position our TAC as a necessary step toward more reliable uncertainty estimation for LLMs. The code repository is available at https://github.com/ponhvoan/TruthAnchor/.

0 Citations

0 Influential

40.4657359028 Altmetric

202.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!