2603.21376v1 Mar 22, 2026 cs.AI

외부화된 추론을 장려하기 위한 트랜스포머 아키텍처 개선

A transformer architecture alteration to incentivise externalised reasoning

Puria Radmard

Citations: 71

h-index: 4

Edward James Young

Citations: 20

h-index: 1

Elizabeth Pavlova

Citations: 5

h-index: 1

M. Koroliuk

Citations: 186

h-index: 2

Karthik Viswanathan

Citations: 30

h-index: 3

C. Tice

Citations: 22

h-index: 2

본 연구에서는 LLM이 더욱 상세한 추론을 수행하도록 하기 위해, 모델이 초기 단계에서 계산을 중단하도록 학습하는 새로운 아키텍처 변경 및 사후 학습 파이프라인을 제안합니다. 기존 트랜스포머 아키텍처에 중간 레이어에 조기 종료 메커니즘을 추가하고, 모델이 심층 계산 없이 다음 토큰을 예측할 수 있는 경우 더 얕은 레이어에서 종료하도록 학습합니다. 보정 단계를 거친 후, 강화 학습을 사용하여 모델이 작업 성능을 유지하면서 가능한 한 빨리 종료하도록 유도합니다. 소규모 추론 모델에 대한 예비 결과를 제시하여, 모델이 토큰별로 계산량을 적응적으로 줄이는 것을 학습한다는 것을 보여줍니다. 저희는 이 접근 방식을 적절한 규모로 적용하면, 추론 모델이 내부 활성화를 사용하여 비단기적인 계획을 수행하는 데 사용하는 불필요한 계산량을 최소화하고, 예측하기 어려운 토큰에만 해당 계산을 할당할 수 있을 것으로 예상합니다.

Original Abstract

We propose a new architectural change, and post-training pipeline, for making LLMs more verbose reasoners by teaching a model to truncate forward passes early. We augment an existing transformer architecture with an early-exit mechanism at intermediate layers and train the model to exit at shallower layers when the next token can be predicted without deep computation. After a calibration stage, we incentivise the model to exit as early as possible while maintaining task performance using reinforcement learning. We provide preliminary results to this effect for small reasoning models, showing that they learn to adaptively reduce computations across tokens. We predict that, applied at the right scale, our approach can minimise the amount of excess computation that reasoning models have at their disposal to perform non-myopic planning using their internal activations, reserving this only for difficult-to-predict tokens.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!