2601.20844v2 Jan 28, 2026 cs.LG

임베딩 기반 상위-k 검색을 위한 $\mathbb{R}^{2k}$ 공간의 이론적 충분성

$\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval

Yangqiu Song

Citations: 269

h-index: 4

Zihao Wang

Citations: 120

h-index: 6

Hang Yin

Tsinghua University

Citations: 133

h-index: 7

Lihui Liu

Citations: 617

h-index: 14

Hanghang Tong

Citations: 347

h-index: 7

Ginny Y. Wong

Citations: 392

h-index: 12

Simon See

Citations: 86

h-index: 4

본 논문에서는 부분집합 관계(m개의 원소를 가지는 ${m \choose k}$개의 최대 k개 원소로 이루어진 부분집합)를 벡터 공간에 임베딩하는 데 필요한 최소 차원을 연구합니다. 이러한 최소 임베딩 가능한 차원(Minimal Embeddable Dimension, MED)에 대한 엄격한 경계를 이론적으로 도출하고, 다양한 '거리' 또는 '유사성' 개념, 즉 $\ell_2$ 거리, 내적, 코사인 유사성에 대해 실험적으로 이를 뒷받침합니다. 또한, 보다 현실적인 설정에서 수치 시뮬레이션을 수행했는데, 여기서 ${m \choose k}$개의 부분집합 임베딩은 포함된 원소들의 임베딩의 중심값으로 선택되었습니다. 이러한 시뮬레이션을 통해, 임베딩 가능한 차원과 임베딩할 원소의 개 사이에 로그 함수적 관계가 존재함을 쉽게 확인할 수 있었습니다. 이러한 결과는 임베딩 기반 검색의 제한이 주로 기하학적 제약 조건보다는 학습 가능성 문제에서 비롯된다는 것을 시사하며, 이는 향후 알고리즘 설계에 중요한 지침이 될 것입니다.

Original Abstract

This paper studies the minimal dimension required to embed subset memberships ($m$ elements and ${m\choose k}$ subsets of at most $k$ elements) into vector spaces, denoted as Minimal Embeddable Dimension (MED). The tight bounds of MED are derived theoretically and supported empirically for various notions of "distances" or "similarities," including the $\ell_2$ metric, inner product, and cosine similarity. In addition, we conduct numerical simulation in a more achievable setting, where the ${m\choose k}$ subset embeddings are chosen as the centroid of the embeddings of the contained elements. Our simulation easily realizes a logarithmic dependency between the MED and the number of elements to embed. These findings imply that embedding-based retrieval limitations stem primarily from learnability challenges, not geometric constraints, guiding future algorithm design.

0 Citations

0 Influential

7 Altmetric

35.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!