2601.03066v1 Jan 06, 2026 cs.CL

대규모 언어 모델은 추론 토큰의 기능적 중요성을 인코딩하는가?

Do LLMs Encode Functional Importance of Reasoning Tokens?

Citations: 18

h-index: 3

Citations: 69

h-index: 4

대규모 언어 모델은 긴 추론 과정을 통해 복잡한 작업을 해결하며, 높은 정확도를 달성하지만, 계산 비용이 증가하고 기능적으로 관련된 추론을 분리하는 능력이 저하된다. 기존 연구에서는 확률적 샘플링, 휴리스틱 또는 최첨단 모델의 감독을 통해 이러한 추론 과정을 단축하려고 시도했지만, 모델이 답변 생성 과정에서 토큰 수준의 기능적 중요성을 내부적으로 인코딩하는지에 대한 통찰력은 제한적이었다. 본 연구는 이러한 간극을 진단적으로 분석하고, 모델의 likelihood를 유지하면서 지정된 목표 하에서 제거 시 모델 likelihood의 저하가 최소화되는 추론 토큰을 반복적으로 제거하는 likelihood-preserving 삭제 절차인 '탐욕적 가지치기(greedy pruning)'를 제안한다. 제안하는 방법으로 생성된 추론 과정을 지식 증류 프레임워크에서 평가한 결과, 탐욕적 가지치기를 통해 생성된 추론 과정으로 학습된 모델이 동일한 추론 길이에서 최첨단 모델의 감독 하에 압축된 모델보다 더 우수한 성능을 보였다. 또한, 분석 결과 체계적인 가지치기 패턴이 나타났으며, attention score가 탐욕적 가지치기 순위를 예측할 수 있음을 보여주었다. 이는 모델이 추론 토큰에 대해 중요하지 않은 기능적 중요성 구조를 인코딩하고 있음을 시사한다.

Original Abstract

Large language models solve complex tasks by generating long reasoning chains, achieving higher accuracy at the cost of increased computational cost and reduced ability to isolate functionally relevant reasoning. Prior work on compact reasoning shortens such chains through probabilistic sampling, heuristics, or supervision from frontier models, but offers limited insight into whether models internally encode token-level functional importance for answer generation. We address this gap diagnostically and propose greedy pruning, a likelihood-preserving deletion procedure that iteratively removes reasoning tokens whose removal minimally degrades model likelihood under a specified objective, yielding length-controlled reasoning chains. We evaluate pruned reasoning in a distillation framework and show that students trained on pruned chains outperform a frontier-model-supervised compression baseline at matched reasoning lengths. Finally, our analysis reveals systematic pruning patterns and shows that attention scores can predict greedy pruning ranks, further suggesting that models encode a nontrivial functional importance structure over reasoning tokens.

3 Citations

0 Influential

2 Altmetric

13.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!