2602.03814v1 Feb 03, 2026 cs.AI

등각 사고(Conformal Thinking): 연산 예산 하에서의 추론을 위한 위험 제어

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

A. Suresh

Citations: 33

h-index: 2

Rishi More

Citations: 30

h-index: 4

William Jurayj

Johns Hopkins University

Citations: 65

h-index: 4

Benjamin Van Durme

Citations: 1,229

h-index: 18

Mehrdad Farajtabar

Citations: 8,039

h-index: 37

Daniel Khashabi

Citations: 353

h-index: 9

Eric Nalisnick

Citations: 91

h-index: 5

Xi Wang

UMASS Amherst

Citations: 82

h-index: 4

Alvin Zhang

Citations: 63

h-index: 5

추론형 대규모 언어 모델(LLM)은 토큰 예산이 증가함에 따라 데이터셋 수준의 정확도가 향상되는 테스트 시간 스케일링을 가능하게 하며, 이는 신뢰성을 높일 수 있을 때 토큰을 소비하고 추가 연산이 도움이 되지 않을 것으로 보일 때 조기에 중단하는 적응형 추론의 동기가 된다. 그러나 적응형 추론을 위한 임계값과 토큰 예산을 설정하는 것은 근본적인 위험-정확도 상충 관계를 수반하는 실용적인 과제이다. 우리는 예산 설정 문제를 연산량을 최소화하면서 오류율을 제한하는 위험 제어 문제로 재정립한다. 우리의 프레임워크는 모델이 확신할 때 추론을 중단하는 상한 임계값(부정확한 출력의 위험 감수)과 해결 불가능한 인스턴스를 선제적으로 중단하는 새로운 매개변수적 하한 임계값(조기 중단의 위험 감수)을 도입한다. 목표 위험과 검증 세트가 주어지면, 우리는 분포 무관(distribution-free) 위험 제어를 사용하여 이러한 중단 메커니즘을 최적으로 지정한다. 다중 예산 제어 기준이 있는 시나리오의 경우, 효율성 손실을 통합하여 계산상 가장 효율적인 종료 메커니즘을 선택한다. 다양한 추론 작업과 모델에 걸친 실증적 결과는 우리의 위험 제어 접근 방식의 효과를 입증하며, 사용자가 지정한 위험 목표를 준수하면서 하한 임계값 및 앙상블 중단 메커니즘을 통해 연산 효율성 이득을 얻음을 보여준다.

Original Abstract

Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when additional computation is unlikely to help. However, setting the token budget, as well as the threshold for adaptive reasoning, is a practical challenge that entails a fundamental risk-accuracy trade-off. We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute. Our framework introduces an upper threshold that stops reasoning when the model is confident (risking incorrect output) and a novel parametric lower threshold that preemptively stops unsolvable instances (risking premature stoppage). Given a target risk and a validation set, we use distribution-free risk control to optimally specify these stopping mechanisms. For scenarios with multiple budget controlling criteria, we incorporate an efficiency loss to select the most computationally efficient exiting mechanism. Empirical results across diverse reasoning tasks and models demonstrate the effectiveness of our risk control approach, demonstrating computational efficiency gains from the lower threshold and ensemble stopping mechanisms while adhering to the user-specified risk target.

8 Citations

1 Influential

18.5 Altmetric

102.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!