2604.23388v1 Apr 25, 2026 cs.IR

지속적인 생성 검색을 위한 파라미터 기반 메모리 헤드

A Parametric Memory Head for Continual Generative Retrieval

M. D. Rijke

Citations: 1,639

h-index: 20

Kidist Amde Mekonnen

Citations: 15

h-index: 3

Yubao Tang

Citations: 73

h-index: 4

생성 정보 검색(GenIR)은 검색을 단일 신경망 모델로 통합하여 쿼리로부터 문서 식별자(docid)를 직접 디코딩합니다. 이러한 모델 기반 인덱스 방식은 아키텍처의 단순성을 제공하지만, 동적인 문서 컬렉션에는 적합하지 않습니다. 모듈형 시스템과 달리 인덱스를 쉽게 업데이트할 수 있는 반면, GenIR의 지식은 파라미터에 의해 인코딩되므로, 전체 파인튜닝 및 파라미터 효율적인 파인튜닝과 같은 표준적인 적응 방법은 재앙적인 망각을 유발할 수 있습니다. 순차적인 적응은 새로 추가된 문서에 대한 검색 성능을 향상시키지만, 이전 문서에 대한 성능을 크게 저하시키는 것으로 나타났으며, 이는 뚜렷한 안정성-유연성 균형 문제를 드러냅니다. 이러한 문제를 해결하기 위해, 우리는 사후 적응 메모리 튜닝(PAMT)을 제안합니다. PAMT는 적응된 모델에 모듈형 파라미터 기반 메모리 헤드(PMH)를 추가하는 메모리 기반 안정화 단계입니다. PAMT는 핵심 모델을 고정하고, 고정된 주소 지정 방식을 갖는 프로덕트-키 메모리를 연결합니다. 프리픽스-트라이 제약 하의 디코딩 과정에서, 디코더의 숨겨진 상태는 PMH를 희소하게 쿼리하여 숨겨진 공간에서 잔차 보정을 생성합니다. 이러한 보정은 고정된 출력 임베딩 행렬을 통해 스코어 조정으로 매핑되며, 트라이에 유효한 토큰에 대해서만 계산됩니다. 이를 통해 docid 생성을 안내하면서 라우팅 및 핵심 모델의 파라미터를 고정합니다. 슬라이스 간 간섭을 제한하기 위해, PAMT는 디코딩 시간의 접근 통계를 사용하여 선택된 고정된 메모리 값 예산만 업데이트합니다. 현재 슬라이스에서 자주 활성화되는 항목과 이전 세션에서 거의 사용되지 않는 항목을 우선적으로 처리합니다. MS MARCO 및 Natural Questions 데이터셋에 대한 순차적, 분리된 코퍼스 증분 실험 결과, PAMT는 새로 추가된 문서에 대한 검색 성능에 미치는 영향은 최소화하면서 이전 슬라이스에 대한 정보 유지율을 크게 향상시키는 것으로 나타났습니다. 또한, 세션당 업데이트되는 메모리 값은 희소한 부분 집합에 국한됩니다.

Original Abstract

Generative information retrieval (GenIR) consolidates retrieval into a single neural model that decodes document identifiers (docids) directly from queries. While this model-as-index paradigm offers architectural simplicity, it is poorly suited to dynamic document collections. Unlike modular systems, where indexes are easily updated, GenIR's knowledge is parametrically encoded in its weights; consequently, standard adaptation methods such as full and parameter-efficient fine-tuning can induce catastrophic forgetting. We show that sequential adaptation improves retrieval on newly added documents but substantially degrades performance on earlier slices, exposing a pronounced stability-plasticity trade-off. To address this, we propose post-adaptation memory tuning (PAMT), a memory-only stabilization stage that augments an adapted model with a modular parametric memory head (PMH). PAMT freezes the backbone and attaches a product-key memory with fixed addressing. During prefix-trie constrained decoding, decoder hidden states sparsely query PMH to produce residual corrections in hidden space; these corrections are mapped to score adjustments via the frozen output embedding matrix, computed only over trie-valid tokens. This guides docid generation while keeping routing and backbone parameters fixed. To limit cross-slice interference, PAMT updates only a fixed budget of memory values selected using decoding-time access statistics, prioritizing entries frequently activated by the current slice and rarely used in prior sessions. Experiments on MS MARCO and Natural Questions under sequential, disjoint corpus increments show that PAMT substantially improves retention on earlier slices with minimal impact on retrieval performance for newly added documents, while modifying only a sparse subset of memory values per session.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!