2602.10560v1 Feb 11, 2026 cs.CL

언제 암기하고 언제 멈출까: 장문맥 추론을 위한 게이티드 순환 메모리

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Yaorui Shi

Citations: 300

h-index: 9

Tat-Seng Chua

Citations: 3,271

h-index: 31

An Zhang

Citations: 196

h-index: 8

Xiang Wang

Citations: 841

h-index: 14

Leheng Sheng

Citations: 576

h-index: 12

Wenchang Ma

Citations: 279

h-index: 9

Yongtao Zhang

Citations: 49

h-index: 3

Ting Huang

Citations: 449

h-index: 4

Ke Shen

Citations: 446

h-index: 8

장문맥 추론은 다양한 실제 응용 분야에서 매우 중요하지만, 대규모 언어 모델(LLM)은 문맥 길이가 증가함에 따라 성능 저하를 겪는다는 어려움이 있습니다. 최근 연구인 MemAgent는 RNN과 유사한 루프를 통해 문맥을 청크 단위로 처리하고 최종 답변을 위해 텍스트 메모리를 업데이트함으로써 이러한 문제를 해결하려고 시도했습니다. 그러나 이러한 단순한 순환 메모리 업데이트 방식은 다음과 같은 두 가지 중요한 단점을 가지고 있습니다. (i) 증거가 없는 청크에서도 무분별하게 업데이트할 수 있기 때문에 메모리가 빠르게 증가할 수 있습니다. (ii) 루프에는 종료 메커니즘이 없기 때문에 충분한 증거가 수집된 후에도 불필요한 계산이 발생합니다. 이러한 문제를 해결하기 위해, 우리는 더욱 안정적이고 효율적인 장문맥 추론을 가능하게 하는 두 개의 텍스트 제어 게이트를 통합한 GRU-Mem을 제안합니다. 구체적으로, GRU-Mem에서는 업데이트 게이트가 열린 경우에만 메모리가 업데이트되며, 종료 게이트가 열리면 순환 루프가 즉시 종료됩니다. 모델이 이러한 기능을 수행할 수 있도록, 우리는 end-to-end 강화 학습(RL) 내에서 업데이트 및 종료 동작에 각각 보상을 제공하는 $r^{ ext{update}}$ 및 $r^{ ext{exit}}$라는 두 가지 보상 신호를 도입했습니다. 다양한 장문맥 추론 작업에 대한 실험 결과는 GRU-Mem의 효과성과 효율성을 입증하며, 일반적으로 MemAgent보다 최대 400% 빠른 추론 속도를 제공합니다.

Original Abstract

While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the context length grows. Recent work MemAgent has tried to tackle this by processing context chunk-by-chunk in an RNN-like loop and updating a textual memory for final answering. However, this naive recurrent memory update faces two crucial drawbacks: (i) memory can quickly explode because it can update indiscriminately, even on evidence-free chunks; and (ii) the loop lacks an exit mechanism, leading to unnecessary computation after even sufficient evidence is collected. To address these issues, we propose GRU-Mem, which incorporates two text-controlled gates for more stable and efficient long-context reasoning. Specifically, in GRU-Mem, the memory only updates when the update gate is open and the recurrent loop will exit immediately once the exit gate is open. To endow the model with such capabilities, we introduce two reward signals $r^{\text{update}}$ and $r^{\text{exit}}$ within end-to-end RL, rewarding the correct updating and exiting behaviors respectively. Experiments on various long-context reasoning tasks demonstrate the effectiveness and efficiency of GRU-Mem, which generally outperforms the vanilla MemAgent with up to 400\% times inference speed acceleration.

3 Citations

0 Influential

15.5 Altmetric

80.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!