2605.14802v1 May 14, 2026 cs.AI

장기 LLM 페르소나 일관성을 위한 이질적인 시간 기반 메모리 관리 프레임워크

A Heterogeneous Temporal Memory Governance Framework for Long-Term LLM Persona Consistency

Zhao Yang

Citations: 487

h-index: 5

Wang Huan

Citations: 0

h-index: 0

Yingshu Li

Citations: 113

h-index: 5

Haomiao Tu

Citations: 0

h-index: 0

Lin Hujite

Citations: 0

h-index: 0

대규모 언어 모델(LLM)은 종종 사실 손실, 시간 순서 혼동, 페르소나 변화, 그리고 장거리 상호 작용 중, 특히 잡음이 많은 지식 기반, 컨텍스트 초기화, 그리고 모델 간 전송 상황에서 안정성 저하 문제를 겪습니다. 이러한 문제를 해결하기 위해, 우리는 장기 대화에 대한 외부 시간 기반 메모리 관리 프레임워크인 ARPM을 소개합니다. ARPM은 정적 지식 메모리와 동적 대화 경험 메모리를 분리하고, 벡터 검색, BM25, RRF 융합, 이중 시간 재순위, 시간 순서 증거 읽기, 그리고 증거 검증 및 답변 연결을 위한 제어된 분석 프로토콜을 결합합니다. 모델 가중치에 페르소나 일관성을 인코딩하거나 단순히 긴 컨텍스트에 의존하는 기존 방식과 달리, ARPM은 일관성을 추적 가능하고, 감사 가능하며, 전송 가능한 관리 문제로 간주합니다. 엔지니어링 로그를 사용하여 세 가지 실험을 수행했습니다. 첫째, 50라운드의 질의응답 설정에서, 1:5 및 1:200+의 신호 대 잡음 비율을 비교하고, CSV 자동 판단과 수동 검토를 구분했습니다. 1:5의 경우, CSV 재현율 정확도는 54.0%였지만, 수동 검토를 통해 100.0%로 향상되었습니다. 1:200+의 경우, 해당 값은 각각 44.0% 및 80.0%였습니다. 이러한 결과는 자동 규칙이 프롬프트에 지원 증거가 포함된 후에 재현율을 과소평가할 수 있음을 보여줍니다. 둘째, 삭제(ablation) 실험 결과는 최근 일관성을 위해 대화 기록 검색이 필수적임을 보여줍니다. 이 기능을 비활성화하면 정확도가 100%에서 66.7%로 감소하고, BM25를 비활성화하면 80.0%로 감소하며, 이는 순수한 의미 검색만으로는 수정 및 추적이 충분하지 않음을 나타냅니다. 셋째, 510만 문자의 잡음 환경, 주기적인 컨텍스트 초기화, 그리고 다중 모델 핸드오프 환경에서, ARPM은 의미적 일관성, 경계 일관성, 그리고 페르소나 일관성을 유지하는 동시에, 약한 프로토콜 준수로 인해 발생하는 한계를 드러냅니다. 이러한 결과는 장기 페르소나 일관성을 관리 가능한 구성 요소로 분해하고, 투명하게 평가할 수 있음을 보여줍니다.

Original Abstract

Large language models often suffer from fact loss, timeline confusion, persona drift, and reduced stability during long-range interaction, especially under high-noise knowledge bases, context clearing, and cross-model transfer. To address these issues, we introduce ARPM, an external temporal memory governance framework for long-term dialogue. ARPM separates static knowledge memory from dynamic dialogue experience memory and combines vector retrieval, BM25, RRF fusion, dual-temporal reranking, chronological evidence reading, and a controlled analysis protocol for evidence verification and answer binding. Unlike approaches that encode persona consistency into model weights or rely only on long context, ARPM treats continuity as a traceable, auditable, and transferable governance problem. Using engineering logs, we conduct three experiments. First, in a 50-round question-answering setting, we compare signal-to-noise ratios of 1:5 and 1:200+, and distinguish CSV auto-judgment from manual review. Under 1:5, CSV recall accuracy is 54.0%, while manual review raises it to 100.0%. Under 1:200+, the values are 44.0% and 80.0%. These results show that automatic rules can underestimate recall after supporting evidence enters the prompt. Second, ablation results show that dialogue history retrieval is necessary for recent continuity: disabling it reduces strict accuracy from 100% to 66.7%, and disabling BM25 reduces it to 80.0%, indicating that pure semantic retrieval is insufficient for correction and tracing. Third, under a 5.1-million-character noise substrate, periodic context clearing, and multi-model handoff, ARPM maintains semantic continuity, boundary continuity, and persona consistency, while exposing limits caused by weak protocol compliance. These findings show that long-term persona consistency can be decomposed into governable components and evaluated in a white-box manner.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!