2601.18383v1 Jan 26, 2026 cs.AI

대규모 추론 모델의 효율적인 추론을 위한 동적 사고 토큰 선택

Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models

Zhenyuan Guo

Citations: 22

h-index: 3

Wenlong Meng

Citations: 95

h-index: 6

Chen Gong

Citations: 22

h-index: 3

Wenzhi Chen

Citations: 211

h-index: 9

Tong Chen

Citations: 45

h-index: 3

Xing Yu

Citations: 1,376

h-index: 6

Chengkun Wei

Citations: 328

h-index: 11

대규모 추론 모델(LRM)은 최종 답을 도출하기 전 명시적으로 추론 과정을 생성함으로써 복잡한 문제를 해결하는 데 탁월합니다. 그러나 이러한 긴 생성 과정은 상당한 메모리 사용량과 연산 오버헤드를 발생시켜 LRM의 효율성에 병목이 됩니다. 본 연구에서는 어텐션 맵을 이용해 추론 과정의 영향력을 분석하여 흥미로운 현상을 발견했습니다. 즉, 추론 과정의 일부 결정적 토큰만이 모델을 최종 정답으로 유도하며, 나머지 토큰들의 기여도는 미미하다는 점입니다. 이러한 관찰을 바탕으로 우리는 동적 사고 토큰 선택(DynTS)을 제안합니다. 이 방법은 결정적 토큰을 식별하여 추론 시 해당 토큰의 키-값(KV) 캐시 상태만 유지하고, 나머지 불필요한 항목은 제거함으로써 효율성을 최적화합니다.

Original Abstract

Large Reasoning Models (LRMs) excel at solving complex problems by explicitly generating a reasoning trace before deriving the final answer. However, these extended generations incur substantial memory footprint and computational overhead, bottlenecking LRMs' efficiency. This work uses attention maps to analyze the influence of reasoning traces and uncover an interesting phenomenon: only some decision-critical tokens in a reasoning trace steer the model toward the final answer, while the remaining tokens contribute negligibly. Building on this observation, we propose Dynamic Thinking-Token Selection (DynTS). This method identifies decision-critical tokens and retains only their associated Key-Value (KV) cache states during inference, evicting the remaining redundant entries to optimize efficiency.

1 Citations

0 Influential

5.5 Altmetric

28.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!