2604.15109v2 Apr 16, 2026 cs.CL

IUQ: 장문 대규모 언어 모델 생성을 위한 질문 기반 불확실성 정량화

IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation

Jinhao Duan

Citations: 2,349

h-index: 17

Kaidi Xu

Citations: 6

h-index: 1

Hao Fan

Citations: 14

h-index: 2

대규모 언어 모델(LLM)의 빠른 발전에도 불구하고, LLM 생성 과정에서의 불확실성 정량화는 여전히 중요한 과제입니다. 최근 연구에서는 LLM이 짧거나 제한된 답변 세트만 생성하도록 하여 강력한 성능을 달성했지만, 많은 실제 응용 분야에서는 장문 형식의 자유로운 텍스트 생성이 필요합니다. 이러한 환경에서 LLM은 종종 의미적으로 일관성이 있지만 사실적으로 부정확한 응답을 생성하며, 이는 다면적인 의미와 복잡한 언어 구조로 인해 더욱 어려워집니다. 이러한 문제를 해결하기 위해, 본 논문에서는 샘플 간 일관성과 샘플 내 충실성을 활용하여 장문 LLM 출력의 불확실성을 정량화하는 새로운 프레임워크인 질문 기반 불확실성 정량화(IUQ)를 소개합니다. IUQ는 질문-응답 패러다임을 사용하여, 주장에 대한 불확실성과 모델의 충실성을 측정하는 데 필요한 신뢰성 있는 지표를 제공합니다. 다양한 모델 아키텍처 및 크기에 대한 실험 결과는 IUQ가 널리 사용되는 두 가지 장문 생성 데이터셋에서 우수한 성능을 보임을 입증합니다. 코드 및 관련 자료는 https://github.com/louisfanhz/IUQ 에서 확인할 수 있습니다.

Original Abstract

Despite the rapid advancement of Large Language Models (LLMs), uncertainty quantification in LLM generation is a persistent challenge. Although recent approaches have achieved strong performance by restricting LLMs to produce short or constrained answer sets, many real-world applications require long-form and free-form text generation. A key difficulty in this setting is that LLMs often produce responses that are semantically coherent yet factually inaccurate, while the underlying semantics are multifaceted and the linguistic structure is complex. To tackle this challenge, this paper introduces Interrogative Uncertainty Quantification (IUQ), a novel framework that leverages inter-sample consistency and intra-sample faithfulness to quantify the uncertainty in long-form LLM outputs. By utilizing an interrogate-then-respond paradigm, our method provides reliable measures of claim-level uncertainty and the model's faithfulness. Experimental results across diverse model families and model sizes demonstrate the superior performance of IUQ over two widely used long-form generation datasets. The code is available at https://github.com/louisfanhz/IUQ.

1 Citations

0 Influential

28.5 Altmetric

143.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!