2605.15102v1 May 14, 2026 cs.CL

자기 회상(Self-Recall) 방식을 활용한 다중 턴 대화 일관성 향상

Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

Renning Pang

Citations: 7

h-index: 2

Tian Lan

Citations: 65

h-index: 3

Leyuan Liu

Citations: 1

h-index: 1

Xiaoming Huang

Citations: 221

h-index: 4

Piao Tong

Citations: 24

h-index: 3

Xiaosong Zhang

Citations: 0

h-index: 0

대규모 언어 모델(LLM) 기반 다중 턴 대화 시스템은 종종 인접하지 않은 턴 간의 의존성을 추적하는 데 어려움을 겪으며, 이는 일관성과 확장성을 저해합니다. 대화가 길어질수록 중요한 정보는 희소해지고 관련 없는 맥락 속에 묻혀 있으며, 전체 대화 기록을 처리하는 것은 심각한 효율성 병목 현상을 초래합니다. 기존 솔루션은 높은 지연 시간을 갖는 외부 메모리에 의존하거나, 반복적인 요약 과정을 통해 세부 정보를 잃어버리는 단점이 있습니다. 본 논문에서는 다중 턴 대화에서 장거리 맥락 의존성과 희소한 정보 신호를 해결하기 위한 프레임워크인 자기 회상(Self-Recall Thinking, SRT)을 제안합니다. SRT는 유용한 과거 턴을 식별하고 이를 사용하여 맥락적으로 적절한 응답을 생성하며, 모델이 추론 과정에서 선택적으로 맥락을 회상하고 추론할 수 있도록 합니다. 이 과정은 외부 모듈 없이 해석 가능한 회상 단계를 통합하는 내재적인 추론 프로세스를 제공합니다. SRT는 다음과 같은 구성 요소를 포함합니다: (1) 의존성 구축: 유용한 과거 턴을 식별하고 이를 자기 회상 체인으로 변환합니다; (2) 능력 초기화: 회상 토큰을 활용한 추론 체인을 가능하게 하는 학습을 수행합니다; (3) 추론 개선: 검증 가능한 보상을 사용하여 정확도를 향상시키고, 올바른 답변을 위한 회상 및 추론을 최적화합니다. 여러 데이터 세트에서의 실험 결과, SRT는 기존 방법보다 F1 점수를 4.7% 향상시키고, 엔드투엔드 지연 시간을 14.7% 단축시켰으며, 추론 지연 시간과 정확성 사이의 균형을 이루고 최첨단 모델보다 우수한 성능을 보였습니다.

Original Abstract

Large language model (LLM) based multi-turn dialogue systems often struggle to track dependencies across non-adjacent turns, undermining both consistency and scalability. As conversations lengthen, essential information becomes sparse and is buried in irrelevant context, while processing the entire dialogue history incurs severe efficiency bottlenecks. Existing solutions either rely on high latency external memory or lose fine-grained details through iterative summarization. In this paper, we propose Self-Recall Thinking (SRT), a framework designed to address long-range contextual dependency and sparse informative signals in multi-turn dialogue. SRT identifies helpful historical turns and uses them to generate contextually appropriate responses, enabling the model to selectively recall and reason over context during inference. This process yields an endogenous reasoning process that integrates interpretable recall steps without external modules. SRT incorporates: (1) Dependency Construction: Generating and converting it into self-recall chains; (2)Capability Initialization: Training to enable reasoning chains with recall tokens capability; (3)Reasoning Improvement: Refining accuracy via verifiable rewards to optimize recall and reasoning for correct answers. Experiments on multiple datasets demonstrate that SRT improves F1 score by 4.7% and reduces end-to-end latency by 14.7% over prior methods, achieving a balance between reasoning latency and accuracy, and outperforming state-of-the-art baselines.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!