2604.01202v3 Apr 01, 2026 cs.AI

그러므로 나는 존재한다. 나는 생각한다.

Therefore I am. I Think

Sai Rajeswar

Citations: 85

h-index: 5

Esakkivel Esakkiraja

Citations: 0

h-index: 0

D. Akhiyarov

Citations: 30

h-index: 2

Rajagopal Venkatesaramani

Citations: 70

h-index: 5

본 논문에서는 대규모 언어 추론 모델이 선택을 내릴 때, 먼저 생각한 후 결정하는 것인지, 아니면 먼저 결정한 후 생각하는 것인지에 대한 질문을 다룹니다. 우리는 추론 모델에서 감지 가능한, 초기 단계의 결정이 사고 과정에 영향을 미친다는 증거를 제시합니다. 구체적으로, 간단한 선형 분석 방법을 통해 모델이 텍스트를 생성하기 전 단계에서 도구 사용 결정(tool-calling decisions)을 매우 높은 정확도로 예측할 수 있으며, 심지어 단 하나의 추론 토큰이 생성되기 전에도 예측할 수 있다는 것을 보여줍니다. 활성화 조작(activation steering) 실험 결과는 이러한 인과 관계를 뒷받침합니다. 의사 결정 방향을 변경하면 논의 과정이 과도하게 길어지며, 많은 경우 모델의 행동이 반전됩니다(모델과 벤치마크에 따라 7%에서 79%까지). 또한 행동 분석을 통해, 의사 결정 방향이 변경되면 사고 과정은 종종 이러한 변화를 정당화하며, 이에 저항하지 않는다는 것을 보여줍니다. 종합적으로, 이러한 결과는 추론 모델이 텍스트를 통해 논의하기 전에 행동 선택을 먼저 결정할 수 있음을 시사합니다.

Original Abstract

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.

1 Citations

0 Influential

2.5 Altmetric

13.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!