2603.17839v1 Mar 18, 2026 cs.CL

LLM이 어떻게 언어적 확신 값을 계산하는가?

How do LLMs Compute Verbal Confidence

Simon Osindero

Citations: 46,713

h-index: 36

Arthur Conmy

Citations: 18

h-index: 3

Federico Barbero

Citations: 564

h-index: 7

D. Kumaran

Citations: 60,826

h-index: 43

Viorica Patraucean

Citations: 4,510

h-index: 15

Petar Veličković

Citations: 458

h-index: 10

언어적 확신은 LLM에게 숫자 또는 범주로 자신의 확신 정도를 표현하도록 요청하여 블랙박스 모델에서 불확실성 추정치를 추출하는 데 널리 사용됩니다. 그러나 LLM이 내부적으로 이러한 점수를 어떻게 생성하는지는 아직 알려져 있지 않습니다. 본 연구에서는 두 가지 질문에 답하고자 합니다. 첫째, 확신 값이 언제 계산되는가 - 요청 시 즉시 계산되는 것인가, 아니면 답변 생성 중에 자동으로 계산되어 나중에 검색될 수 있도록 저장되는 것인가? 둘째, 언어적 확신이 무엇을 나타내는가 - 토큰 로그 확률인가, 아니면 답변 품질에 대한 더 풍부한 평가인가? Gemma 3 27B 및 Qwen 2.5 7B 모델을 중심으로, 저장된 값을 검색하는 증거를 제시합니다. 활성화 제어, 패치, 노이즈 추가 및 교환 실험을 통해 확신 표현이 언어적 표현 위치에 나타나기 전에 답변과 관련된 위치에서 먼저 나타나는 것을 확인했습니다. 어텐션 차단 실험은 정보 흐름을 보여줍니다. 확신은 답변 토큰에서 수집되어 첫 번째 답변 이후 위치에 저장된 다음 출력에 사용됩니다. 또한, 선형 탐색 및 분산 파티셔닝 분석을 통해 저장된 표현이 언어적 확신 값에서 상당한 분산을 설명하며, 토큰 로그 확률 이상의 답변 품질 평가를 나타내는 것으로 나타났습니다. 이러한 결과는 언어적 확신이 사후 재구성이 아닌 자동적이고 정교한 자기 평가를 반영하며, 이는 LLM의 메타인지 이해와 교정 개선에 중요한 의미를 갖는다는 것을 보여줍니다.

Original Abstract

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed - just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents - token log-probabilities, or a richer evaluation of answer quality? Focusing on Gemma 3 27B and Qwen 2.5 7B, we provide convergent evidence for cached retrieval. Activation steering, patching, noising, and swap experiments reveal that confidence representations emerge at answer-adjacent positions before appearing at the verbalization site. Attention blocking pinpoints the information flow: confidence is gathered from answer tokens, cached at the first post-answer position, then retrieved for output. Critically, linear probing and variance partitioning reveal that these cached representations explain substantial variance in verbal confidence beyond token log-probabilities, suggesting a richer answer-quality evaluation rather than a simple fluency readout. These findings demonstrate that verbal confidence reflects automatic, sophisticated self-evaluation -- not post-hoc reconstruction -- with implications for understanding metacognition in LLMs and improving calibration.

5 Citations

0 Influential

21.5 Altmetric

112.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!