2604.24608v1 Apr 27, 2026 cs.IR

대규모 언어 모델 기반 어텐션 재순위화에서 쿼리에 따라 헤드를 선택하는 방식: 쿼리 라우팅을 통한 어텐션 기반 재순위화

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

Fengran Mo

University of Montreal

Citations: 892

h-index: 18

Yuxing Tian

Citations: 3

h-index: 1

Weixu Zhang

Citations: 57

h-index: 1

Jian-Yun Nie

Citations: 398

h-index: 11

Zhiqi Huang

Citations: 41

h-index: 3

최근 대규모 언어 모델(LLM)은 문서의 관련성을 추정하기 위해 어텐션 신호를 활용하여 정교한 제로샷 재순위화기로 사용될 가능성이 탐색되었습니다. 그러나 기존 방법은 어텐션 신호를 모든 헤드에 걸쳐 집계하거나, 휴리스틱 규칙에 의해 미리 선택된 일정한 헤드 집합에 의존합니다. 이러한 방식은 정보가 풍부한 헤드가 쿼리나 도메인에 따라 달라질 수 있으므로 최적이 아닐 수 있습니다. 또한, 여러 헤드를 무작정 결합하면 중복이나 충돌하는 순위 신호로 인해 성능이 저하될 수 있습니다. 본 논문에서는 대규모 언어 모델 기반 어텐션 재순위화에 대한 쿼리 의존적인 헤드 선택 방법인 RouteHead를 제안합니다. 구체적으로, 각 쿼리를 최적의 헤드 집합으로 매핑할 수 있는 경량 라우터를 학습하고, 이러한 헤드에서 수집된 어텐션 신호를 집계하여 관련성 점수를 계산합니다. 쿼리-헤드 최적 쌍에 대한 레이블이 존재하지 않으므로, 먼저 오프라인 검색을 통해 유사 레이블을 구성합니다. 라우터는 각 헤드를 학습 가능한 임베딩으로 표현하고, 동결된 LLM의 은닉 상태에서 추출된 임베딩을 사용하여 각 쿼리를 표현합니다. 그런 다음, 이 라우터는 유사 레이블과 희소성 정규화 기법을 사용하여 학습됩니다. 다양한 벤치마크와 여러 LLM 백본에서의 실험 결과, 제안된 방법은 강력한 기준 모델보다 일관되게 우수한 성능을 보였습니다.

Original Abstract

Large Language Models (LLMs) have recently been explored as fine-grained zero-shot re-rankers by leveraging attention signals to estimate document relevance. However, existing methods either aggregate attention signals across all heads or rely on a statically selected subset identified by heuristic rules. This solution can be suboptimal because the informative heads can vary across queries or domains. Moreover, naively combining multiple heads can degrade performance due to redundancy or conflicting ranking signals. In this paper, we propose a query-dependent head selection method, RouteHead, for attention-based re-ranking with LLMs. Specifically, we learn a lightweight router that can map each query to an optimal head set, and relevance scores are computed by aggregating attention signals only from these heads. Since query-to-head optimal labels are unavailable, we first construct pseudo labels via an offline search. The router represents each head with a learnable embedding and represents each query using an embedding extracted from the hidden states of the frozen LLM. Then it is trained on the pseudo labels with a sparsity regularizer. Experiments on diverse benchmarks and multiple LLM backbones show that the proposed method consistently outperforms strong baselines.

0 Citations

0 Influential

9 Altmetric

45.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!