2601.07372v1 Jan 12, 2026 cs.CL

확장 가능한 조회 기반 조건부 메모리: 대규모 언어 모델을 위한 새로운 희소성 축

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Xingkai Yu

Citations: 12,204

h-index: 10

Bing-Li Wang

Citations: 12,058

h-index: 9

Damai Dai

Citations: 11,189

h-index: 12

Qinyu Chen

Citations: 9,885

h-index: 6

Wangding Zeng

Citations: 11,212

h-index: 9

W. Liang

Citations: 13,772

h-index: 14

Zhenda Xie

Citations: 16,518

h-index: 16

Zhewen Hao

Citations: 10,625

h-index: 6

Xin Cheng

Citations: 661

h-index: 6

Huishuai Zhang

Citations: 341

h-index: 10

Kezhao Huang

Citations: 232

h-index: 6

Yukun Li

Citations: 63

h-index: 3

Han Zhang

Citations: 141

h-index: 4

Dongyan Zhao

Citations: 234

h-index: 6

혼합 전문가(MoE) 모델은 조건부 연산을 통해 모델 용량을 확장하지만, 트랜스포머는 지식 검색을 위한 기본 기능을 갖추지 못하여 비효율적으로 연산을 통해 검색을 시뮬레이션해야 합니다. 이를 해결하기 위해, 우리는 Engram이라는 모듈을 통해 구현된 조건부 메모리를 새로운 희소성 축으로 도입합니다. Engram은 고전적인 $N$-그램 임베딩을 현대화하여 O(1) 조회 기능을 제공합니다. 우리는 희소성 할당 문제를 정의하고, 신경 연산(MoE)과 정적 메모리(Engram) 간의 균형을 최적화하는 U자형 스케일링 법칙을 발견했습니다. 이 법칙에 따라, 우리는 Engram을 270억 개의 파라미터로 확장하고, 동일한 파라미터 수와 FLOPs를 가진 MoE 모델보다 우수한 성능을 달성했습니다. 특히, 메모리 모듈은 지식 검색(예: MMLU +3.4; CMMLU +4.0)에 도움이 될 것으로 예상되었지만, 일반적인 추론(예: BBH +5.0; ARC-Challenge +3.7) 및 코드/수학 분야(예: HumanEval +3.0; MATH +2.4)에서 훨씬 더 큰 성능 향상을 관찰했습니다. 메커니즘 분석 결과, Engram은 핵심 네트워크의 초기 레이어가 정적 재구성을 수행하는 부담을 줄여, 복잡한 추론을 위해 네트워크를 효과적으로 심화시킵니다. 또한, Engram은 로컬 의존성을 조회로 위임함으로써 어텐션 용량을 글로벌 컨텍스트에 할당하여, 긴 컨텍스트 검색 성능을 크게 향상시킵니다(예: Multi-Query NIAH: 84.2에서 97.0으로). 마지막으로, Engram은 인프라에 대한 인식 기반의 효율성을 제공합니다. Engram의 결정론적인 주소 지정 방식은 호스트 메모리에서 런타임 프리페칭을 가능하게 하여, 미미한 오버헤드만 발생합니다. 우리는 조건부 메모리가 차세대 희소 모델을 위한 필수적인 모델링 요소가 될 것이라고 믿습니다.

Original Abstract

While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic $N$-gram embedding for O(1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram). Guided by this law, we scale Engram to 27B parameters, achieving superior performance over a strictly iso-parameter and iso-FLOPs MoE baseline. Most notably, while the memory module is expected to aid knowledge retrieval (e.g., MMLU +3.4; CMMLU +4.0), we observe even larger gains in general reasoning (e.g., BBH +5.0; ARC-Challenge +3.7) and code/math domains~(HumanEval +3.0; MATH +2.4). Mechanistic analyses reveal that Engram relieves the backbone's early layers from static reconstruction, effectively deepening the network for complex reasoning. Furthermore, by delegating local dependencies to lookups, it frees up attention capacity for global context, substantially boosting long-context retrieval (e.g., Multi-Query NIAH: 84.2 to 97.0). Finally, Engram establishes infrastructure-aware efficiency: its deterministic addressing enables runtime prefetching from host memory, incurring negligible overhead. We envision conditional memory as an indispensable modeling primitive for next-generation sparse models.

52 Citations

10 Influential

8 Altmetric

112.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!