2605.06216v1 May 07, 2026 cs.CL

TIDE: 모든 레이어가 컨텍스트 아래에 있는 토큰을 인지하도록 하는 방법

TIDE: Every Layer Knows the Token Beneath the Context

Mehrdad Farajtabar

Citations: 8,039

h-index: 37

Ajay Jaiswal

Citations: 2

h-index: 1

Minsik Cho

Citations: 24

h-index: 3

Lauren Hannah

Citations: 4

h-index: 1

D. Hoang

Citations: 0

h-index: 0

Han-Byul Kim

Citations: 11

h-index: 2

본 논문에서는 현대 LLM의 모든 곳에서 널리 받아들여지지만 충분히 연구되지 않은 설계 방식을 재검토합니다. 바로 토큰 인덱스가 입력 임베딩 레이어에서 한 번 조회된 후 영구적으로 버려지는 방식입니다. 이러한 단일 삽입 가정은 두 가지 구조적 문제를 야기합니다. (i) 희귀 토큰 문제: 어휘의 Zipf 분포로 인해 희귀 토큰의 임베딩은 일반 토큰에 비해 누적 기울기 신호의 일부만 받아 지속적으로 훈련이 부족해집니다. (ii) 컨텍스트 붕괴 문제: 제한된 파라미터 모델은 분포적으로 유사한 토큰을 구별할 수 없는 은닉 상태로 매핑합니다. 이러한 문제를 해결하기 위해, 우리는 TIDE를 제안합니다. TIDE는 표준 트랜스포머에 EmbeddingMemory를 추가하며, 이는 K개의 독립적인 MemoryBlock으로 구성되어 있으며, 토큰 인덱스를 컨텍스트에 독립적인 의미 벡터에 매핑합니다. 이 벡터는 한 번 계산되어 학습 가능한 null bank를 포함하는 깊이 조건부 softmax 라우터를 통해 모든 레이어로 주입됩니다. 우리는 이론적, 실증적 분석을 통해 TIDE가 단일 토큰 식별 주입과 관련된 문제를 해결하고, 다양한 언어 모델링 및 다운스트림 작업에서 성능을 향상시키는 이점을 입증합니다.

Original Abstract

We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distribution of vocabulary causes rare-token embeddings are chronically under-trained due to receiving a fraction of the cumulative gradient signal compared to common tokens; and (ii) the Contextual Collapse Problem, where limited parameters models map distributionally similar tokens to indistinguishable hidden states. As an attempt to address both, we propose TIDE, which augments the standard transformer with EmbeddingMemory: an ensemble of K independent MemoryBlocks that map token indices to context-free semantic vectors, computed once and injected into every layer through a depth-conditioned softmax router with a learnable null bank. We theoretically and empirically establish the benefits of TIDE in addressing the issues associated with single-token identity injection as well as improve performance across multiple language modeling and downstream tasks.

0 Citations

0 Influential

18.5 Altmetric

92.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!