2604.04930v1 Apr 06, 2026 cs.CL

신뢰도 동역학을 활용한 대규모 추론 모델의 조기 종료 방법

Early Stopping for Large Reasoning Models via Confidence Dynamics

Meisam Razaviyayn

Citations: 9,144

h-index: 35

Parsa Hosseini

Citations: 37

h-index: 4

Sumit Nawathe

Citations: 7

h-index: 2

Mahdi Salmani

Citations: 2

h-index: 1

S. Feizi

Citations: 14,281

h-index: 50

대규모 추론 모델은 복잡한 문제를 해결하기 위해 긴 추론 과정을 사용하지만, 이러한 확장된 추론은 상당한 계산 비용을 발생시키고 과도한 사고로 인해 성능 저하를 초래할 수 있습니다. 주요 과제는 모델이 언제 추론을 중단하고 최종 답변을 제공해야 하는지를 결정하는 것입니다. 본 연구에서는 추론 과정 중 생성된 중간 답변의 신뢰도를 분석하고, 다음과 같은 두 가지 특징적인 현상을 관찰했습니다. 정확한 추론 경로는 종종 초기에 높은 신뢰도의 답변에 도달하는 반면, 부정확한 추론은 길고 비생산적인 추론 과정을 거치며, 신뢰도 동역학이 덜 안정적입니다. 이러한 관찰을 바탕으로, 우리는 중간 답변의 신뢰도 동역학을 활용하여 추론을 언제 종료할지 결정하는 조기 종료 방법인 CoDE-Stop(Confidence Dynamics Early Stop)을 제안합니다. CoDE-Stop은 추가적인 학습 없이 기존 모델에 쉽게 통합될 수 있습니다. 우리는 다양한 추론 및 과학 벤치마크에서 여러 모델을 사용하여 CoDE-Stop을 평가했습니다. 기존의 조기 종료 방법과 비교하여, CoDE-Stop은 더 나은 정확도-계산 비용 균형을 달성하며, 표준 전체 길이 추론에 비해 토큰 사용량을 25-50% 절감합니다. 또한, 우리는 추론 과정에서의 신뢰도 동역학에 대한 분석을 제공하여, 정확하고 부정확한 경로에서 신뢰도가 어떻게 변화하는지에 대한 통찰력을 제공합니다.

Original Abstract

Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this work, we study the confidence of intermediate answers during reasoning and observe two characteristic behaviors: correct reasoning trajectories often reach high-confidence answers early, while incorrect rollouts tend to produce long, unproductive reasoning traces and exhibit less reliable confidence dynamics. Motivated by these observations, we propose CoDE-Stop (Confidence Dynamics Early Stop), an early stopping method that leverages the dynamics of intermediate answer confidence to decide when to terminate reasoning, requiring no additional training and easily integrating into existing models. We evaluate CoDE-Stop on diverse reasoning and science benchmarks across multiple models. Compared to prior early stopping methods, it achieves a more favorable accuracy-compute tradeoff and reduces total token usage by 25-50% compared to standard full-length reasoning. In addition, we provide analyses of confidence dynamics during reasoning, offering insights into how confidence changes in both correct and incorrect trajectories.

0 Citations

0 Influential

25 Altmetric

125.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!