2604.07922v1 Apr 09, 2026 cs.AI

SAT: 단계별 적응적 추론을 통한 추론 정확도와 효율성 균형

SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

Weili Guan

Citations: 3

h-index: 1

Xuefeng Bai

Citations: 229

h-index: 8

Kehai Chen

Citations: 12

h-index: 1

Xinyan Chen

Citations: 175

h-index: 4

Yibin Chen

Citations: 21

h-index: 3

Min Zhang

Citations: 3

h-index: 1

Weiya Huang

Citations: 1

h-index: 1

대규모 추론 모델(LRM)은 복잡한 문제 해결에 혁신을 가져왔지만, 불필요하게 긴 추론 과정을 생성하는 '과도한 사고' 현상을 보입니다. 기존의 해결책들은 토큰 효율성을 개선하지만, 종종 세밀한 제어를 희생하거나 추론 과정의 논리적 일관성을 해칠 위험이 있습니다. 이러한 문제를 해결하기 위해, 우리는 단계별 적응적 추론(SAT) 프레임워크를 제안합니다. SAT는 핵심 추론 구조를 유지하면서 단계별로, 난이도에 따라 추론 과정을 조정하여 불필요한 부분을 제거합니다. SAT는 추론을 유한 상태 머신(FSM)으로 정의하고, 다양한 추론 모드(Slow, Normal, Fast, Skip)를 사용합니다. 경량화된 프로세스 보상 모델(PRM)을 사용하여 이러한 상태를 동적으로 탐색하며, 쉬운 단계는 압축하고 어려운 단계는 깊이를 유지합니다. 9개의 LRM과 7개의 벤치마크를 사용한 실험 결과, SAT는 추론 토큰을 최대 40%까지 줄이면서 일반적으로 정확도를 유지하거나 향상시켰습니다.

Original Abstract

Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!