2603.12529v1 Mar 13, 2026 cs.LG

TERMINATOR: 체인 오브 씽킹 추론에서의 조기 종료를 위한 최적의 종료 지점 학습

TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

Alliot Nagle

Citations: 262

h-index: 6

Jakhongir Saydaliev

Citations: 57

h-index: 3

Dhia Garbaya

Citations: 56

h-index: 2

Michael Gastpar

Citations: 215

h-index: 8

A. Makkuva

Citations: 614

h-index: 12

Hyeji Kim

Citations: 102

h-index: 5

대규모 추론 모델(LRM)은 체인 오브 씽킹(CoT) 추론을 통해 복잡한 추론 작업에서 뛰어난 성능을 보입니다. CoT 추론은 모델이 최종 답변에 도달하기 전에 중간적인 사고 단계를 생성하도록 합니다. 그러나 LRM은 종종 과도한 사고를 하는 경향이 있으며, 때로는 답변이 초기에 생성되었음에도 불구하고 과도한 계산 시간을 소요합니다. 기존 연구에서는 최적의 추론 길이가 존재하며, 이 지점에서 추론을 중단하면 CoT 출력 길이를 크게 줄이면서 성능에 거의 영향을 미치지 않는다는 것을 밝혀냈습니다. 그러나 실제 데이터셋에 대한 최적의 CoT 길이를 결정하는 것은 매우 어렵습니다. 왜냐하면 이는 작업 및 모델에 따라 완전히 달라지기 때문입니다. 본 논문에서는 이러한 문제를 정확하게 해결하고, LRM의 과도한 사고를 완화하기 위한 조기 종료 전략인 TERMINATOR를 설계합니다. TERMINATOR의 핵심 아이디어는 LRM의 최종 답변이 처음 나타나는 위치가 종종 예측 가능하다는 점입니다. 우리는 이러한 초기 답변 위치를 활용하여 TERMINATOR를 학습시키기 위한 새로운 최적 추론 길이 데이터셋을 생성합니다. 이러한 접근 방식을 통해 TERMINATOR는 MATH-500, AIME 2025, HumanEval, 및 GPQA와 같은 네 가지 어려운 실제 데이터셋에서 평균적으로 14%에서 55%의 상당한 CoT 길이 감소를 달성했으며, 현재 최고 수준의 방법보다 우수한 성능을 보입니다.

Original Abstract

Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant overthinking, spending excessive compute time even after the answer is generated early on. Prior work has identified the existence of an optimal reasoning length such that truncating reasoning at this point significantly shortens CoT outputs with virtually no change in performance. However, determining optimal CoT lengths for practical datasets is highly non-trivial as they are fully task and model-dependent. In this paper, we precisely address this and design TERMINATOR, an early-exit strategy for LRMs at inference to mitigate overthinking. The central idea underpinning TERMINATOR is that the first arrival of an LRM's final answer is often predictable, and we leverage these first answer positions to create a novel dataset of optimal reasoning lengths to train TERMINATOR. Powered by this approach, TERMINATOR achieves significant reductions in CoT lengths of 14%-55% on average across four challenging practical datasets: MATH-500, AIME 2025, HumanEval, and GPQA, whilst outperforming current state-of-the-art methods.

2 Citations

0 Influential

6 Altmetric

32.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!