2606.05749v1 Jun 04, 2026 cs.CL

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

Hongtao Liu
Hongtao Liu
Citations: 95
h-index: 5
Jian Yang
Jian Yang
Citations: 14
h-index: 1
Qiyao Peng
Qiyao Peng
Citations: 108
h-index: 4
Kaifeng Chen
Kaifeng Chen
Citations: 201
h-index: 4
Yongqiang Liu
Yongqiang Liu
Citations: 28
h-index: 3
Xiaochen Zhang
Xiaochen Zhang
Citations: 0
h-index: 0
Qing Yang
Qing Yang
Citations: 74
h-index: 3

Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and intermediate reasoning. As interactions accumulate, key evidence becomes scattered and diluted, making multi-hop reasoning noisy. We propose MARDoc, a Memory-Aware Refinement Agent framework that decouples long-document QA into three specialized agents: an Explorer for multi-granularity multimodal retrieval, a Refiner for distilling interaction traces into structured evidence and reasoning memories, and a Reflector for checking evidence sufficiency and providing targeted feedback. Across iterations, the agents rely on a dynamically updated structured memory rather than a full accumulated interaction history. This design reduces context noise while preserving answer-critical facts and their logical dependencies. Experiments on MMLongBench-Doc and DocBench show that MARDoc achieves strong results, outperforming same-backbone baselines and demonstrating the effectiveness of structured memory for agentic document QA.

0 Citations
0 Influential
2.5 Altmetric
12.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!