2602.08864v1 Feb 09, 2026 cs.CL

순환 트랜스포머에서의 동적 컴퓨팅 할당 이해

Understanding Dynamic Compute Allocation in Recurrent Transformers

Ibraheem Muhammad Moosa

Citations: 104

h-index: 5

Suhas Lohit

Citations: 1,637

h-index: 17

Ye Wang

Citations: 72

h-index: 4

Moitreya Chatterjee

Citations: 797

h-index: 14

Wenpeng Yin

Citations: 19

h-index: 2

토큰 레벨 적응형 컴퓨팅은 더 어려운 토큰에는 더 많은 컴퓨팅 자원을 할당하고, 더 쉬운 토큰에는 더 적은 컴퓨팅 자원을 할당하여 추론 비용을 줄이는 것을 목표로 합니다. 그러나 기존 연구는 주로 자연어 벤치마크를 사용하여 작업 레벨 지표로 평가되었으며, 이 경우 토큰 레벨의 난이도는 관찰하기 어렵고 아키텍처적 요인과 혼동되어, 컴퓨팅 할당이 실제로 근본적인 복잡성과 일치하는지 불분명합니다. 본 연구는 세 가지 기여를 통해 이러한 격차를 해소합니다. 첫째, 알고리즘 및 합성 언어 작업을 사용하여 매개변수화된 난이도를 가진 복잡성 제어 평가 패러다임을 도입하여 토큰 레벨의 컴퓨팅 할당을 직접적으로 테스트합니다. 둘째, 토큰별 가변 깊이의 컴퓨팅을 지원하면서 컴퓨팅 할당 결정을 다른 모델 요인으로부터 분리하는 통합 순환 트랜스포머 프레임워크인 ANIRA를 제안합니다. 셋째, 이 프레임워크를 사용하여 복잡성과 일치성, 일반화 및 의사 결정 시점 전반에 걸쳐 토큰 레벨의 적응형 컴퓨팅에 대한 체계적인 분석을 수행합니다. 우리의 결과는 명시적인 난이도 감독 없이도 작업 복잡성과 일치하는 컴퓨팅 할당이 나타날 수 있음을 보여주지만, 이러한 일치는 알고리즘적 일반화를 의미하지 않습니다. 모델은 추가 컴퓨팅 자원을 할당했음에도 불구하고 새로운 입력 크기에 대한 일반화에 실패합니다. 또한, 초기 컴퓨팅 결정은 정적 구조적 단서에 의존하는 반면, 온라인 중단은 알고리즘 실행 상태를 보다 밀접하게 추적한다는 것을 발견했습니다.

Original Abstract

Token-level adaptive computation seeks to reduce inference cost by allocating more computation to harder tokens and less to easier ones. However, prior work is primarily evaluated on natural-language benchmarks using task-level metrics, where token-level difficulty is unobservable and confounded with architectural factors, making it unclear whether compute allocation truly aligns with underlying complexity. We address this gap through three contributions. First, we introduce a complexity-controlled evaluation paradigm using algorithmic and synthetic language tasks with parameterized difficulty, enabling direct testing of token-level compute allocation. Second, we propose ANIRA, a unified recurrent Transformer framework that supports per-token variable-depth computation while isolating compute allocation decisions from other model factors. Third, we use this framework to conduct a systematic analysis of token-level adaptive computation across alignment with complexity, generalization, and decision timing. Our results show that compute allocation aligned with task complexity can emerge without explicit difficulty supervision, but such alignment does not imply algorithmic generalization: models fail to extrapolate to unseen input sizes despite allocating additional computation. We further find that early compute decisions rely on static structural cues, whereas online halting more closely tracks algorithmic execution state.

2 Citations

0 Influential

8.5 Altmetric

44.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!