2601.11913v1 Jan 17, 2026 cs.CL

LSTM-MAS: 장기 문맥 이해를 위한 LSTM 기반 다중 에이전트 시스템

LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding

Lei Bai

Citations: 42

h-index: 4

Jiakang Yuan

Fudan University

Citations: 570

h-index: 14

Yichen Jiang

University of North Carolina at Chapel Hill

Citations: 587

h-index: 9

Peng Ye

Citations: 8

h-index: 1

Chongjun Tu

Citations: 332

h-index: 6

Tao Chen

Citations: 7

h-index: 1

장기 문맥을 효과적으로 처리하는 것은 거대 언어 모델(LLM)에게 여전히 근본적인 과제입니다. 기존의 단일 LLM 기반 방법은 주로 문맥 창 크기를 줄이거나 어텐션 메커니즘을 최적화하지만, 종종 추가적인 계산 비용이 발생하거나 확장된 문맥 길이에 제약이 있습니다. 다중 에이전트 기반 프레임워크는 이러한 한계를 완화할 수 있지만, 여전히 오류 누적 및 환각 현상 확산의 위험이 있습니다. 본 연구에서는 장기 문맥 이해를 위해, 장기 단기 기억(LSTM) 아키텍처에서 영감을 받아 LSTM의 계층적 정보 흐름 및 게이트 메모리 메커니즘을 모방한 다중 에이전트 시스템인 LSTM-MAS를 설계했습니다. 구체적으로, LSTM-MAS는 에이전트를 사슬 구조로 구성하며, 각 노드는 세분 단위 이해를 담당하는 작업 에이전트, 중복 제거를 담당하는 필터 에이전트, 지속적인 오류 감지를 담당하는 판단 에이전트, 그리고 정보 전달 및 저장을 전역적으로 관리하는 관리 에이전트로 구성됩니다. 이는 LSTM의 입력 게이트, 삭제 게이트, 상수 오류 회전 장치 및 출력 게이트와 유사합니다. 이러한 새로운 설계는 텍스트 세그먼트 간의 제어된 정보 전달 및 선택적인 장기 의존성 모델링을 가능하게 하여, 오류 누적 및 환각 현상 확산을 효과적으로 방지합니다. 저희는 제안된 방법을 광범위하게 평가했습니다. 이전의 최적 성능을 보인 다중 에이전트 접근 방식인 CoA와 비교했을 때, 저희 모델은 NarrativeQA, Qasper, HotpotQA 및 MuSiQue 데이터셋에서 각각 40.93%, 43.70%, 121.57% 및 33.12%의 성능 향상을 달성했습니다.

Original Abstract

Effectively processing long contexts remains a fundamental yet unsolved challenge for large language models (LLMs). Existing single-LLM-based methods primarily reduce the context window or optimize the attention mechanism, but they often encounter additional computational costs or constrained expanded context length. While multi-agent-based frameworks can mitigate these limitations, they remain susceptible to the accumulation of errors and the propagation of hallucinations. In this work, we draw inspiration from the Long Short-Term Memory (LSTM) architecture to design a Multi-Agent System called LSTM-MAS, emulating LSTM's hierarchical information flow and gated memory mechanisms for long-context understanding. Specifically, LSTM-MAS organizes agents in a chained architecture, where each node comprises a worker agent for segment-level comprehension, a filter agent for redundancy reduction, a judge agent for continuous error detection, and a manager agent for globally regulates information propagation and retention, analogous to LSTM and its input gate, forget gate, constant error carousel unit, and output gate. These novel designs enable controlled information transfer and selective long-term dependency modeling across textual segments, which can effectively avoid error accumulation and hallucination propagation. We conducted an extensive evaluation of our method. Compared with the previous best multi-agent approach, CoA, our model achieves improvements of 40.93%, 43.70%,121.57% and 33.12%, on NarrativeQA, Qasper, HotpotQA, and MuSiQue, respectively.

1 Citations

0 Influential

7 Altmetric

36.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!