2604.13395v1 Apr 15, 2026 cs.AI

대규모 추론 모델의 불확실성 측정 및 이해

Quantifying and Understanding Uncertainty in Large Reasoning Models

Chen Zhao

Citations: 119

h-index: 2

Mengdi Huai

Citations: 1,566

h-index: 21

Yangyi Li

Citations: 108

h-index: 6

최근 대규모 추론 모델(LRM)은 복잡한 추론 능력에서 상당한 발전을 보여주었습니다. LRM에서 생성되는 불확실성을 정량화하는 것은 매우 중요하지만, 기존 방법은 추론-답변 생성에 대한 유한 표본 보장을 제공하지 않기 때문에 종종 충분하지 않습니다. 컨포멀 예측(CP)은 분포에 의존하지 않고 모델에 독립적인 통계적으로 엄격한 불확실성 집합을 구성하는 방법론으로 주목받고 있습니다. 그러나 기존 CP 방법은 추론 과정과 최종 답변 간의 논리적 연결을 고려하지 않습니다. 또한, 기존 연구에서는 LRM의 불확실성 범위의 기원을 해석하지 못하며, 이는 유효한 추론을 이끄는 특정 학습 요인을 간과하는 경향이 있습니다. 특히, 불확실성을 정량화할 때 추론 품질과 답변의 정확성을 분리하는 것은 어렵고, 동시에 계산적으로 효율적인 설명 방법의 이론적 보장을 동시에 제공하는 것은 더욱 어렵습니다. 이러한 과제에 대응하기 위해, 우리는 먼저 통계적 보장을 제공하는 추론-답변 구조의 불확실성을 정량화하는 새로운 방법론을 제안합니다. 그 후, Shapley 값을 사용하여 예제-단계 설명을 위한 통합 프레임워크를 개발하여, 통계적 보장을 유지하는 데 필요한 학습 예제와 핵심 추론 단계를 증명 가능하게 식별합니다. 또한, 제안된 방법에 대한 이론적 분석을 제공합니다. 어려운 추론 데이터 세트에 대한 광범위한 실험을 통해 제안된 방법의 효과를 검증했습니다.

Original Abstract

Large Reasoning Models (LRMs) have recently demonstrated significant improvements in complex reasoning. While quantifying generation uncertainty in LRMs is crucial, traditional methods are often insufficient because they do not provide finite-sample guarantees for reasoning-answer generation. Conformal prediction (CP) stands out as a distribution-free and model-agnostic methodology that constructs statistically rigorous uncertainty sets. However, existing CP methods ignore the logical connection between the reasoning trace and the final answer. Additionally, prior studies fail to interpret the origins of uncertainty coverage for LRMs as they typically overlook the specific training factors driving valid reasoning. Notably, it is challenging to disentangle reasoning quality from answer correctness when quantifying uncertainty, while simultaneously establishing theoretical guarantees for computationally efficient explanation methods. To address these challenges, we first propose a novel methodology that quantifies uncertainty in the reasoning-answer structure with statistical guarantees. Subsequently, we develop a unified example-to-step explanation framework using Shapley values that identifies a provably sufficient subset of training examples and their key reasoning steps to preserve the guarantees. We also provide theoretical analyses of our proposed methods. Extensive experiments on challenging reasoning datasets verify the effectiveness of the proposed methods.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!