2601.00348v1 Jan 01, 2026 cs.CL

대규모 언어 모델의 사실 기반 생성에 대한 강력한 불확실성 정량화

Robust Uncertainty Quantification for Factual Generation of Large Language Models

Yuhao Zhang

Citations: 39

h-index: 4

Zhongliang Yang

Citations: 48

h-index: 4

Linna Zhou

Citations: 39

h-index: 4

대규모 언어 모델(LLM) 기술의 급속한 발전은 다양한 분야의 전문적인 활동 및 일상생활에 LLM의 통합을 촉진했습니다. 그러나 LLM의 지속적인 환각 문제는 중요한 한계점으로 작용하며, AI가 생성한 콘텐츠의 신뢰성과 정확성을 크게 저해합니다. 이 문제는 과학계의 많은 관심을 받고 있으며, 환각 탐지 및 완화 전략에 대한 광범위한 연구 노력이 진행되고 있습니다. 현재의 방법론적 프레임워크는 중요한 한계를 드러냅니다. 기존의 불확실성 정량화 방법은 전통적인 질문-답변 방식에서는 효과적이지만, 비정형적이거나 적대적인 질문 전략에 직면했을 때 상당한 부족한 모습을 보입니다. 이러한 성능 격차는 LLM이 실제 응용 분야에서 요구하는 강력한 비판적 사고 능력을 갖추었을 때 LLM 응답의 신뢰성에 대한 심각한 우려를 불러일으킵니다. 본 연구는 다수의 사실을 기반으로 텍스트를 생성하는 작업에서 불확실성 정량화 시나리오를 제안하여 이러한 격차를 해소하고자 합니다. 우리는 가짜 이름을 포함하는 함정 질문 세트를 신중하게 구성했습니다. 이 시나리오를 기반으로, 우리는 새롭고 강력한 불확실성 정량화 방법(RU)을 혁신적으로 제안합니다. 이 방법의 효과를 검증하기 위한 일련의 실험을 수행했습니다. 결과는 구성된 함정 질문 세트가 우수한 성능을 보임을 보여줍니다. 또한, 네 가지 다른 모델에 대한 기준 방법과의 비교에서, 제안된 방법은 ROCAUC 값에서 평균 0.1~0.2의 향상을 보여주었으며, 이는 LLM의 환각 문제를 해결하기 위한 새로운 통찰력과 방법을 제공합니다.

Original Abstract

The rapid advancement of large language model(LLM) technology has facilitated its integration into various domains of professional and daily life. However, the persistent challenge of LLM hallucination has emerged as a critical limitation, significantly compromising the reliability and trustworthiness of AI-generated content. This challenge has garnered significant attention within the scientific community, prompting extensive research efforts in hallucination detection and mitigation strategies. Current methodological frameworks reveal a critical limitation: traditional uncertainty quantification approaches demonstrate effectiveness primarily within conventional question-answering paradigms, yet exhibit notable deficiencies when confronted with non-canonical or adversarial questioning strategies. This performance gap raises substantial concerns regarding the dependability of LLM responses in real-world applications requiring robust critical thinking capabilities. This study aims to fill this gap by proposing an uncertainty quantification scenario in the task of generating with multiple facts. We have meticulously constructed a set of trap questions contained with fake names. Based on this scenario, we innovatively propose a novel and robust uncertainty quantification method(RU). A series of experiments have been conducted to verify its effectiveness. The results show that the constructed set of trap questions performs excellently. Moreover, when compared with the baseline methods on four different models, our proposed method has demonstrated great performance, with an average increase of 0.1-0.2 in ROCAUC values compared to the best performing baseline method, providing new sights and methods for addressing the hallucination issue of LLMs.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!