2604.27852v1 Apr 30, 2026 cs.IR

NeocorRAG: 관련 없는 정보 감소, 명확한 증거 증가, 그리고 증거 체인을 통한 더욱 효과적인 기억

NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains

Rongjin Li

Citations: 34

h-index: 2

Shiyao Peng

Citations: 138

h-index: 4

Qianhe Zheng

Citations: 32

h-index: 2

Zhuodi Hao

Citations: 11

h-index: 2

Zichen Tang

Citations: 39

h-index: 3

Qing Huang

Citations: 49

h-index: 4

Jiayu Huang

Citations: 352

h-index: 8

Jiacheng Liu

Citations: 36

h-index: 2

Yifan Zhu

Citations: 82

h-index: 5

H. E

Citations: 695

h-index: 11

검색 증강 생성(RAG)에서 정확한 기억은 핵심 목표이지만, 이 분야에서 중요한 간과점은 개선된 검색 성능이 항상 추론 성능의 상당한 향상으로 이어지지 않는다는 것입니다. 이러한 격차를 진단하기 위해, 우리는 검색이 추론 정확도에 기여하는 정도를 정량화하는 새로운 평가 지표인 '기억 변환율(Recall Conversion Rate, RCR)'을 제안합니다. 주요 RAG 방법의 정량적 분석 결과, Recall@5가 향상될수록 RCR은 거의 선형적으로 감소하는 경향을 보입니다. 이는 이러한 방법에서 검색 품질에 대한 고려 부족이 근본적인 원인임을 시사합니다. 반면, 품질 최적화에만 집중하는 접근 방식은 종종 낮은 기억 성능을 나타냅니다. 두 범주 모두 검색 품질 최적화에 대한 포괄적인 이해가 부족하여, 성능 저하라는 딜레마에 직면합니다. 이러한 문제점을 해결하기 위해, 우리는 포괄적인 검색 품질 최적화 기준을 제시하고 NeocorRAG 프레임워크를 소개합니다. 이 프레임워크는 증거 체인을 체계적으로 활용하여 전체적인 검색 품질 최적화를 달성합니다. 구체적으로, NeocorRAG는 먼저 혁신적인 활성화 검색 알고리즘을 사용하여 정제된 후보 공간을 확보하고, 제약 조건 기반 디코딩을 통해 정확한 증거 체인 생성을 보장합니다. 마지막으로, 검색된 증거 체인 집합은 검색 최적화 프로세스를 안내합니다. HotpotQA, 2WikiMultiHopQA, MuSiQue 및 NQ 벤치마크에서 NeocorRAG는 30억 및 70억 파라미터 모델 모두에서 최고 성능(SOTA)을 달성했으며, 동시에 유사한 방법보다 20% 미만의 토큰을 사용합니다. 본 연구는 높은 기억 성능을 유지하면서 검색 품질을 효과적으로 최적화하는, 효율적이고 학습이 필요 없는 RAG 향상 패러다임을 제시합니다. 저희 코드는 https://github.com/BUPT-Reasoning-Lab/NeocorRAG 에서 공개됩니다.

Original Abstract

Although precise recall is a core objective in Retrieval-Augmented Generation (RAG), a critical oversight persists in the field: improvements in retrieval performance do not consistently translate to commensurate gains in downstream reasoning. To diagnose this gap, we propose the Recall Conversion Rate (RCR), a novel evaluation metric to quantify the contribution of retrieval to reasoning accuracy. Our quantitative analysis of mainstream RAG methods reveals that as Recall@5 improves, the RCR exhibits a near-linear decay. We identify the neglect of retrieval quality in these methods as the underlying cause. In contrast, approaches that focus solely on quality optimization often suffer from inferior recall performance. Both categories lack a comprehensive understanding of retrieval quality optimization, resulting in a trade-off dilemma. To address these challenges, we propose comprehensive retrieval quality optimization criteria and introduce the NeocorRAG framework. This framework achieves holistic retrieval quality optimization by systematically mining and utilizing Evidence Chains. Specifically, NeocorRAG first employs an innovative activated search algorithm to obtain a refined candidate space. Then it ensures precise evidence chain generation through constrained decoding. Finally, the retrieved set of evidence chains guides the retrieval optimization process. Evaluated on benchmarks including HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ, NeocorRAG achieves SOTA performance on both 3B and 70B parameter models, while consuming less than 20% of tokens used by comparable methods. This study presents an efficient, training-free paradigm for RAG enhancement that effectively optimizes retrieval quality while maintaining high recall. Our code is released at https://github.com/BUPT-Reasoning-Lab/NeocorRAG.

0 Citations

0 Influential

37.92453324894 Altmetric

189.6 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!