2601.05027v1 Jan 08, 2026 cs.AI

OptiSet: 검색 증강 생성을 위한 집합 선택 및 랭킹 통합 최적화

OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation

Yi Jiang

Citations: 175

h-index: 4

Sendong Zhao

Citations: 671

h-index: 12

Jianbo Li

Citations: 12

h-index: 2

Bairui Hu

Citations: 0

h-index: 0

Yanrui Du

Citations: 163

h-index: 8

Hao Wang

Citations: 479

h-index: 9

Bing Qin

Citations: 562

h-index: 11

검색 증강 생성(RAG)은 대규모 외부 말뭉치에서 검색된 증거를 통합하여 생성 품질을 향상시킵니다. 그러나 기존의 대부분의 방법은 개별 관련성을 기반으로 상위 k개의 구절을 정적으로 선택하는 방식에 의존하기 때문에, 구절 간의 조합적 이득을 활용하지 못하고 종종 상당한 중복을 초래합니다. 이러한 한계를 해결하기 위해 본 논문에서는 RAG를 위한 집합 선택과 집합 단위 랭킹을 통합하는 집합 중심 프레임워크인 OptiSet을 제안합니다. OptiSet은 '확장 후 정제(Expand-then-Refine)' 패러다임을 채택하여, 먼저 질의를 다양한 관점으로 확장해 다채로운 후보 풀을 확보한 뒤, 재선택을 통해 후보 풀을 정제하여 간결한 증거 집합을 구성합니다. 또한, 강력한 LLM의 감독 없이 생성기의 집합 조건부 효용 변화를 통해 선호도 레이블을 도출하는 자체 합성 전략을 고안하여 상호 보완적인 증거와 중복된 증거를 식별합니다. 마지막으로, 집합 선택과 집합 단위 랭킹을 공동으로 최적화하는 집합-리스트(set-list) 방식의 학습 전략을 도입하여 모델이 간결하면서도 높은 이득을 주는 증거 집합을 선호하게 만듭니다. 광범위한 실험 결과, OptiSet은 복잡한 조합 문제에서 성능을 향상시키고 생성을 더욱 효율적으로 만드는 것으로 나타났습니다. 소스 코드는 공개되어 있습니다.

Original Abstract

Retrieval-Augmented Generation (RAG) improves generation quality by incorporating evidence retrieved from large external corpora. However, most existing methods rely on statically selecting top-k passages based on individual relevance, which fails to exploit combinatorial gains among passages and often introduces substantial redundancy. To address this limitation, we propose OptiSet, a set-centric framework that unifies set selection and set-level ranking for RAG. OptiSet adopts an "Expand-then-Refine" paradigm: it first expands a query into multiple perspectives to enable a diverse candidate pool and then refines the candidate pool via re-selection to form a compact evidence set. We then devise a self-synthesis strategy without strong LLM supervision to derive preference labels from the set conditional utility changes of the generator, thereby identifying complementary and redundant evidence. Finally, we introduce a set-list wise training strategy that jointly optimizes set selection and set-level ranking, enabling the model to favor compact, high-gain evidence sets. Extensive experiments demonstrate that OptiSet improves performance on complex combinatorial problems and makes generation more efficient. The source code is publicly available.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!