2606.05749v1 Jun 04, 2026 cs.CL

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

Hongtao Liu

Citations: 95

h-index: 5

Jian Yang

Citations: 14

h-index: 1

Qiyao Peng

Citations: 108

h-index: 4

Kaifeng Chen

Citations: 201

h-index: 4

Yongqiang Liu

Citations: 28

h-index: 3

Xiaochen Zhang

Citations: 0

h-index: 0

Qing Yang

Citations: 74

h-index: 3

Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and intermediate reasoning. As interactions accumulate, key evidence becomes scattered and diluted, making multi-hop reasoning noisy. We propose MARDoc, a Memory-Aware Refinement Agent framework that decouples long-document QA into three specialized agents: an Explorer for multi-granularity multimodal retrieval, a Refiner for distilling interaction traces into structured evidence and reasoning memories, and a Reflector for checking evidence sufficiency and providing targeted feedback. Across iterations, the agents rely on a dynamically updated structured memory rather than a full accumulated interaction history. This design reduces context noise while preserving answer-critical facts and their logical dependencies. Experiments on MMLongBench-Doc and DocBench show that MARDoc achieves strong results, outperforming same-backbone baselines and demonstrating the effectiveness of structured memory for agentic document QA.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!