2605.00529v1 May 01, 2026 cs.LG

문서 간 검색 기능을 강화한 생성 모델을 위한 계층적 추상 트리

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Citations: 326

h-index: 10

Citations: 68

h-index: 4

검색 증강 생성(RAG)은 외부 지식을 활용하여 대규모 언어 모델의 성능을 향상시키며, 트리 기반 RAG는 문서를 계층적 인덱스로 구성하여 다양한 수준의 질의를 지원합니다. 그러나 단일 문서 검색에 맞춰 설계된 기존 트리-RAG 방법은 여러 문서에 걸친 복합 질문에 대한 확장성 측면에서 중요한 어려움을 겪습니다. (1) $k$-평균 클러스터링은 경직된 분포 가정으로 인해 노이즈를 발생시켜 데이터 분포에 대한 적응력이 떨어집니다. (2) 트리 인덱스는 명시적인 문서 간 연결이 부족하여 구조적 격리가 발생합니다. (3) 추상화 수준이 너무 높아 세부적인 내용이 가려집니다. 이러한 제한 사항을 해결하기 위해, 우리는 두 가지 핵심 구성 요소를 갖는 트리-RAG 프레임워크인 $Ψ$-RAG를 제안합니다. 첫째, 사전 가정을 하지 않고 데이터 분포에 적응하는 반복적인 "병합 및 축소" 프로세스를 통해 구축된 계층적 추상 트리 인덱스입니다. 둘째, 재구성된 질의와 에이전트 기반 하이브리드 검색기를 통해 지식 베이스와 지능적으로 상호 작용하는 다중 수준 검색 에이전트입니다. $Ψ$-RAG는 토큰 수준의 질문 답변부터 문서 수준의 요약까지 다양한 작업을 지원합니다. 교차 문서 복합 질문 답변 벤치마크에서 $Ψ$-RAG는 평균 F1 점수에서 RAPTOR보다 25.9% 향상되고 HippoRAG 2보다 7.4% 향상된 성능을 보입니다. 코드 및 관련 정보는 다음 링크에서 확인할 수 있습니다: https://github.com/Newiz430/Psi-RAG.

Original Abstract

Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where $k$-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose $Ψ$-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative "merging and collapse" process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. $Ψ$-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.

0 Citations

0 Influential

41.290482690107 Altmetric

206.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!