2601.22446v1 Jan 30, 2026 cs.AI

상시 안전 PAC 효율적 추론

Anytime Safe PAC Efficient Reasoning

Bingyi Jing

Citations: 31

h-index: 4

Hao Zeng

Citations: 369

h-index: 5

Huajun Zeng

Citations: 2

h-index: 1

Jianguo Huang

Citations: 8

h-index: 1

Youxin Zhu

Citations: 9

h-index: 1

Chengyao Yu

Citations: 3

h-index: 1

대규모 추론 모델(LRM)은 복잡한 작업에서 놀라운 성능을 보여주었지만 높은 계산 비용과 지연 시간 문제를 겪고 있다. 쉬운 쿼리를 비사고(non-thinking) 모델로 라우팅하여 효율성을 높이는 선택적 사고 전략이 존재하지만, 기존 접근 방식은 특히 비사고 모델의 성능 손실이 부분적으로만 관측되고 데이터가 비정상(non-stationary)적인 온라인 환경에서 통제 불가능한 오류를 발생시키는 경우가 많다. 이를 해결하기 위해, 우리는 부분 피드백 하에서 상시 안전하고 효율적인 온라인 추론을 가능하게 하는 체계적인 방법인 Betting Probably Approximately Correct (B-PAC) 추론을 제안한다. 구체적으로, 우리는 역 성향 점수 추정량을 활용하여 후보 임계값에 대한 테스트 슈퍼마틴게일을 구성한 다음, 안전성에 대한 누적된 통계적 증거를 기반으로 라우팅 임계값을 동적으로 조절한다. 이론적으로, 우리는 B-PAC 추론의 상시 유효한 성능 손실 제어와 효율성을 입증한다. 광범위한 실험을 통해, B-PAC 추론이 성능 손실을 사용자가 지정한 수준 이하로 제어하면서 사고 모델 사용량을 최대 81.01%까지 줄여 계산 오버헤드를 크게 감소시킴을 입증한다.

Original Abstract

Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex tasks but suffer from high computational costs and latency. While selective thinking strategies improve efficiency by routing easy queries to non-thinking models, existing approaches often incur uncontrollable errors, especially in online settings where the performance loss of a non-thinking model is only partially observed and data are non-stationary. To address this, we propose Betting Probably Approximately Correct (B-PAC) reasoning, a principled method that enables anytime safe and efficient online reasoning under partial feedback. Specifically, we utilize inverse propensity scoring estimators to construct test supermartingales for candidate thresholds, and then dynamically adjust the routing threshold based on the accumulated statistical evidence of safety. Theoretically, we establish the anytime-valid performance loss control and the efficiency of B-PAC reasoning. Extensive experiments demonstrate that B-PAC reasoning significantly reduces computational overhead, decreasing thinking model usage by up to 81.01\%, while controlling the performance loss below the user-specified level.

1 Citations

0 Influential

2.5 Altmetric

13.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!