2603.20034v1 Mar 20, 2026 cs.IR

CoverageBench: 작업 및 도메인 간 정보 보장성 평가

CoverageBench: Evaluating Information Coverage across Tasks and Domains

Benjamin Van Durme

Citations: 1,229

h-index: 18

Dawn J. Lawrie

Citations: 483

h-index: 11

Saron Samuel

Citations: 20

h-index: 3

Eugene Yang

Citations: 44

h-index: 4

Andrew Yates

Citations: 146

h-index: 5

Ian Soboroff

Citations: 116

h-index: 4

Trevor Adriaanse

Citations: 3

h-index: 1

본 연구는 임의 검색 알고리즘의 정보 보장성을 측정하고자 합니다. 정보 보장성은 검색 결과가 얼마나 많은 관련 정보 범위를 포함하는지를 나타냅니다. 정보 보장성은 검색 시스템, 특히 검색 증강 생성(RAG) 시스템에서 중요한 요소입니다. 임의 검색의 전통적인 지표인 정밀도와 재현율은 검색 시스템이 더 많은 관련 문서를 검색할수록 더 높은 점수를 부여합니다. 그러나 임의 테스트 컬렉션에서 관련성은 다른 문서를 고려하지 않고 개별 문서에 대해서만 정의되므로, 높은 재현율은 보장성을 확보하는 데 충분하지만 필수적인 것은 아닙니다. 순위 편향 정밀도(RBP), 정규화 할인 누적 이득(nDCG), 평균 정밀도(MAP)와 같은 다른 지표도 마찬가지입니다. 웹 검색의 다양성 순위를 고려하여 개발된 테스트 컬렉션은 웹 도메인에서의 보장성을 뒷받침하는 여러 측면을 포함합니다. 본 연구에서는 기존 컬렉션을 활용하여 정보 보장성을 평가하기 위한 테스트 컬렉션 세트를 구축했습니다. 이 세트는 연구자들에게 다양한 장르와 작업을 포괄하는 통합 테스트 환경을 제공합니다. 모든 주제, 정보 조각, 관련성 레이블 및 기준 순위는 Hugging Face Datasets에 공개되며, 공개적으로 사용 가능한 문서 컬렉션에 접근하는 방법에 대한 지침도 함께 제공됩니다.

Original Abstract

We wish to measure the information coverage of an ad hoc retrieval algorithm, that is, how much of the range of available relevant information is covered by the search results. Information coverage is a central aspect for retrieval, especially when the retrieval system is integrated with generative models in a retrieval-augmented generation (RAG) system. The classic metrics for ad hoc retrieval, precision and recall, reward a system as more and more relevant documents are retrieved. However, since relevance in ad hoc test collections is defined for a document without any relation to other documents that might contain the same information, high recall is sufficient but not necessary to ensure coverage. The same is true for other metrics such as rank-biased precision (RBP), normalized discounted cumulative gain (nDCG), and mean average precision (MAP). Test collections developed around the notion of diversity ranking in web search incorporate multiple aspects that support a concept of coverage in the web domain. In this work, we construct a suite of collections for evaluating information coverage from existing collections. This suite offers researchers a unified testbed spanning multiple genres and tasks. All topics, nuggets, relevance labels, and baseline rankings are released on Hugging Face Datasets, along with instructions for accessing the publicly available document collections.

2 Citations

0 Influential

9 Altmetric

47.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!