2603.14517v1 Mar 15, 2026 cs.AI

잊도록 학습: 수면에서 영감을 받은 기억 통합을 통한 대규모 언어 모델의 선행 간섭 해결

Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proactive Interference in Large Language Models

Citations: 2

h-index: 1

대규모 언어 모델(LLM)은 선행 간섭(Proactive Interference, PI)이라는 문제에 직면하는데, 이는 컨텍스트 윈도우 내의 오래된 정보가 현재 값의 검색을 방해하여 정확도를 저하시키는 현상입니다. 이러한 간섭은 오래된 연관성이 축적됨에 따라 로그 함수적으로 악화되며, 컨텍스트 길이와 프롬프트 엔지니어링을 통한 완화 노력에도 불구하고 지속되는 병목 현상입니다. 생물학적 뇌는 수면 의존적인 기억 통합을 통해 이와 유사한 문제를 해결하는데, 이는 시냅스 축소, 선택적 재활용, 그리고 목표 지향적인 망각을 포함합니다. 본 연구에서는 변환기 기반 LLM에 학습된 수면 주기를 적용하는 생물학적 영감을 받은 프레임워크인 SleepGate를 제안합니다. SleepGate는 세 가지 메커니즘을 도입합니다: (1) 새로운 항목이 이전 항목을 대체하는 시점을 감지하는 컨텍스트 인지적 시간 태거; (2) 오래된 캐시 항목을 선택적으로 제거하거나 압축하도록 학습된 경량 망각 게이트; (3) 살아남은 항목을 압축된 요약으로 병합하는 통합 모듈. 이러한 구성 요소는 적응적인 엔트로피 기반 트리거에 의해 제어되는 마이크로 수면 주기로 정기적으로 활성화됩니다. 우리는 깨어 있는 단계에서 언어 모델링을 최적화하고, 수면 단계에서 통합 후 검색을 최적화하는 이중 단계의 학습 목표를 공식화했습니다. 이론적 분석 결과, SleepGate는 간섭 범위(interference horizon)를 O(n)에서 O(log n)으로 줄입니다. 4계층, 793K 파라미터의 소규모 변환기 모델에 대한 실험에서, SleepGate는 PI 깊이가 5일 때 99.5%의 검색 정확도를, 깊이가 10일 때 97.0%의 정확도를 달성했습니다. 반면, 전체 KV 캐시, 슬라이딩 윈도우, H2O, StreamingLLM, 그리고 감소만 적용한 모델을 포함한 5개의 기준 모델은 모두 18% 미만의 성능을 보였습니다. 본 프레임워크는 프롬프트 엔지니어링으로 해결할 수 없는 아키텍처 수준의 솔루션을 제공합니다.

Original Abstract

Large language models (LLMs) suffer from proactive interference (PI): outdated information in the context window disrupts retrieval of current values. This interference degrades retrieval accuracy log-linearly as stale associations accumulate, a bottleneck that persists regardless of context length and resists prompt-engineering mitigations. Biological brains resolve an analogous challenge through sleep-dependent memory consolidation: synaptic downscaling, selective replay, and targeted forgetting. We propose SleepGate, a biologically inspired framework that augments transformer-based LLMs with a learned sleep cycle over the key-value (KV) cache. SleepGate introduces three mechanisms: (1) a conflict-aware temporal tagger detecting when new entries supersede old ones; (2) a lightweight forgetting gate trained to selectively evict or compress stale cache entries; and (3) a consolidation module that merges surviving entries into compact summaries. These components activate periodically during inference in sleep micro-cycles, governed by an adaptive entropy-based trigger. We formalize a dual-phase training objective jointly optimizing language modeling during the wake phase and post-consolidation retrieval during the sleep phase. Theoretical analysis shows SleepGate reduces the interference horizon from O(n) to O(log n). In experiments with a small-scale transformer (4 layers, 793K parameters), SleepGate achieves 99.5% retrieval accuracy at PI depth 5 and 97.0% at depth 10, while all five baselines -- full KV cache, sliding window, H2O, StreamingLLM, and decay-only ablation -- remain below 18%. Our framework offers an architecture-level solution that prompt engineering cannot address.

1 Citations

0 Influential

0.5 Altmetric

3.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!