2601.22361v1 Jan 29, 2026 cs.CL

MERMAID: 다중 에이전트 기반 반복적 지식 연동을 통한 기억 강화 검색 및 추론 시스템 - 신뢰도 평가를 위한 방법

MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment

Yupeng Cao

Citations: 3

h-index: 1

Chengyang He

Citations: 2

h-index: 1

Yangyang Yu

Citations: 701

h-index: 9

Ping Wang

Citations: 146

h-index: 7

K. Subbalakshmi

Citations: 223

h-index: 6

온라인 콘텐츠의 진실성 평가는 점점 더 중요해지고 있습니다. 최근 대규모 언어 모델(LLM)은 자동화된 진실성 평가, 특히 자동 사실 확인 및 주장 검증 시스템 분야에서 상당한 발전을 가져왔습니다. 일반적인 진실성 평가 파이프라인은 복잡한 주장을 하위 주장으로 분해하고, 외부 증거를 검색한 다음, LLM 추론을 적용하여 진실성을 평가합니다. 그러나 기존 방법은 종종 증거 검색을 정적이고 독립적인 단계로 취급하며, 검색된 증거를 효과적으로 관리하거나 여러 주장 간에 재사용하지 못합니다. 본 연구에서는 검색 및 추론 과정을 긴밀하게 결합하는, 기억 강화형 다중 에이전트 진실성 평가 프레임워크인 MERMAID를 제안합니다. MERMAID는 에이전트 기반 검색, 구조화된 지식 표현, 그리고 지속적인 메모리 모듈을 Reason-Action 스타일의 반복적 프로세스 내에 통합하여, 동적 증거 획득과 주장 간의 증거 재사용을 가능하게 합니다. 검색된 증거를 증거 메모리에 저장함으로써, MERMAID는 중복 검색을 줄이고 검증 효율성과 일관성을 향상시킵니다. 우리는 GPT, LLaMA, Qwen 등 다양한 LLM을 사용하여 세 가지 사실 확인 벤치마크와 두 가지 주장 검증 데이터셋에서 MERMAID를 평가했습니다. 실험 결과는 MERMAID가 검색 효율성을 향상시키면서 최첨단 성능을 달성하며, 신뢰성 있는 진실성 평가를 위해 검색, 추론 및 메모리를 융합하는 것이 효과적임을 보여줍니다.

Original Abstract

Assessing the veracity of online content has become increasingly critical. Large language models (LLMs) have recently enabled substantial progress in automated veracity assessment, including automated fact-checking and claim verification systems. Typical veracity assessment pipelines break down complex claims into sub-claims, retrieve external evidence, and then apply LLM reasoning to assess veracity. However, existing methods often treat evidence retrieval as a static, isolated step and do not effectively manage or reuse retrieved evidence across claims. In this work, we propose MERMAID, a memory-enhanced multi-agent veracity assessment framework that tightly couples the retrieval and reasoning processes. MERMAID integrates agent-driven search, structured knowledge representations, and a persistent memory module within a Reason-Action style iterative process, enabling dynamic evidence acquisition and cross-claim evidence reuse. By retaining retrieved evidence in an evidence memory, the framework reduces redundant searches and improves verification efficiency and consistency. We evaluate MERMAID on three fact-checking benchmarks and two claim-verification datasets using multiple LLMs, including GPT, LLaMA, and Qwen families. Experimental results show that MERMAID achieves state-of-the-art performance while improving the search efficiency, demonstrating the effectiveness of synergizing retrieval, reasoning, and memory for reliable veracity assessment.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!