2604.07863v1 Apr 09, 2026 cs.IR

에이전트 기반 멀티모달 웹 기록을 활용한 학습 기반 그래프 메모리를 통한 작업 적응형 정보 검색

Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory

K. Berahmand

Citations: 3,078

h-index: 34

Saman Forouzandeh

Citations: 6

h-index: 2

Mahdi Jalili

Citations: 26

h-index: 4

긴 멀티모달 웹 상호작용 기록에서 관련 정보를 검색하는 것은 어려운데, 그 이유는 관련성이 변화하는 작업 상태, 모달리티(스크린샷, HTML 텍스트, 구조화된 신호) 및 시간적 거리에 따라 달라지기 때문이다. 기존 접근 방식은 일반적으로 정적인 유사성 임계값 또는 고정 용량 버퍼에 의존하는데, 이는 현재 작업 컨텍스트에 대한 관련성을 적응적으로 반영하지 못한다. 본 논문에서는 정책 경사 최적화를 통해 하위 작업 성공률을 기반으로 에이전트 기록에 대한 *작업 적응형* 관련성 그래프를 구축하는 학습 기반 그래프 메모리 검색 시스템인 **ACGM**을 제안한다. ACGM은 모달리티별 감쇠율을 통해 이질적인 시간적 동적 특성을 포착한다 (시각 정보는 텍스트보다 4.3배 빠르게 감쇠: $λ_v{=}0.47$ vs. ext{ } $λ_x{=}0.11$) 및 희소 연결(노드당 3.2개 엣지)을 학습하여 효율적인 $O( ext{log }T)$ 검색을 가능하게 한다. WebShop, VisualWebArena, Mind2Web 데이터셋에서 ACGM은 검색 품질을 **82.7 nDCG@10** (+9.3, $p{<}0.001$) 및 **89.2% Precision@10** (+7.7)으로 향상시켜, 19개의 강력한, 밀집형, 재순위화, 멀티모달 및 그래프 기반의 기존 모델보다 우수한 성능을 보인다. 본 논문의 결과를 재현할 수 있는 코드는 다음 GitHub 주소에서 제공된다: { extcolor{blue}{https://github.com/S-Forouzandeh/ACGM-Agentic-Web}}.

Original Abstract

Retrieving relevant observations from long multi-modal web interaction histories is challenging because relevance depends on the evolving task state, modality (screenshots, HTML text, structured signals), and temporal distance. Prior approaches typically rely on static similarity thresholds or fixed-capacity buffers, which fail to adapt relevance to the current task context. We propose \textbf{ACGM}, a learned graph-memory retriever that constructs \emph{task-adaptive} relevance graphs over agent histories using policy-gradient optimization from downstream task success. ACGM captures heterogeneous temporal dynamics with modality-specific decay (visual decays $4.3\times$ faster than text: $λ_v{=}0.47$ vs.\ $λ_x{=}0.11$) and learns sparse connectivity (3.2 edges/node), enabling efficient $O(\log T)$ retrieval. Across WebShop, VisualWebArena, and Mind2Web, ACGM improves retrieval quality to \textbf{82.7 nDCG@10} (+9.3 over GPT-4o, $p{<}0.001$) and \textbf{89.2\% Precision@10} (+7.7), outperforming 19 strong dense, re-ranking, multi-modal, and graph-based baselines. Code to reproduce our results is available at{\color{blue}\href{https://github.com/S-Forouzandeh/ACGM-Agentic-Web}{Saman Forouzandeh}}.

0 Citations

0 Influential

37 Altmetric

185.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!