2601.03211v1 Jan 06, 2026 cs.IR

소규모 언어 모델을 활용한 효율적인 기업 검색 관련성 라벨링

Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers

Nick Craswell

Citations: 5,842

h-index: 33

Sulaiman Vesal

Stanford.edu, fau.de

Citations: 1,759

h-index: 18

Tianwei Chen

Citations: 1

h-index: 1

Ye Wang

Citations: 72

h-index: 4

Yue Kang

Citations: 2

h-index: 1

Benji Schussheim

Citations: 0

h-index: 0

Diana Licon

Citations: 0

h-index: 0

Shixing Cao

Citations: 17

h-index: 1

Kunho Kim

Citations: 4

h-index: 1

Billy Norcilien

Citations: 0

h-index: 0

Jonah Karpman

Citations: 0

h-index: 0

Mahmound Sayed

Citations: 0

h-index: 0

Mike Taylor

Citations: 353

h-index: 8

Tao Sun

Citations: 169

h-index: 5

Pavel Metrikov

Citations: 71

h-index: 5

Vipul Agarwal

Citations: 193

h-index: 7

Irene Shaffer

Citations: 16

h-index: 2

Soundar Srinivasan

Citations: 9

h-index: 2

Jacob Danovitch

Microsoft

Citations: 305

h-index: 4

Chris Quirk

Citations: 86

h-index: 3

Zhuoyi Huang

Citations: 59

h-index: 4

D. Atia

Citations: 25

h-index: 3

기업 검색에서 고품질 데이터셋을 대규모로 구축하는 것은 양질의 라벨링 데이터 확보의 어려움으로 인해 중요한 과제로 남아 있습니다. 이러한 문제를 해결하기 위해, 본 연구에서는 정확한 관련성 라벨링을 위한 소규모 언어 모델(SLM)을 효율적으로 튜닝하는 방법을 제안합니다. 이는 최첨단 대규모 언어 모델(LLM)과 비교하거나 더 나은 품질을 갖춘, 고속의 도메인 특화 라벨링을 가능하게 합니다. 기업 도메인에서 고품질 데이터셋의 부족 문제를 해결하기 위해, 본 연구에서는 합성 데이터 생성 기법을 활용합니다. 구체적으로, LLM을 사용하여 시드 문서로부터 현실적인 기업 검색 쿼리를 생성하고, BM25를 사용하여 어려운 부정 샘플을 검색하며, 또 다른 LLM(선생 모델)을 사용하여 관련성 점수를 할당합니다. 생성된 데이터셋은 이후 SLM으로 증류되어, 간결한 관련성 라벨러를 생성합니다. 본 연구에서는 923개의 기업 검색 쿼리-문서 쌍으로 구성된 고품질 벤치마크 데이터셋을 사용하여 제안하는 방법을 평가하였으며, 증류된 SLM이 인간 평가자와 유사하거나 더 나은 수준의 일관성을 달성하는 것을 확인했습니다. 또한, 튜닝된 라벨러는 처리량을 크게 향상시켜 17배의 증가를 달성했으며, 비용 효율성 또한 19배 향상되었습니다. 이러한 접근 방식은 기업 규모의 검색 응용 프로그램에 대한 확장 가능하고 비용 효율적인 관련성 라벨링을 가능하게 하며, 실제 환경에서의 빠른 오프라인 평가 및 반복을 지원합니다.

Original Abstract

In enterprise search, building high-quality datasets at scale remains a central challenge due to the difficulty of acquiring labeled data. To resolve this challenge, we propose an efficient approach to fine-tune small language models (SLMs) for accurate relevance labeling, enabling high-throughput, domain-specific labeling comparable or even better in quality to that of state-of-the-art large language models (LLMs). To overcome the lack of high-quality and accessible datasets in the enterprise domain, our method leverages on synthetic data generation. Specifically, we employ an LLM to synthesize realistic enterprise queries from a seed document, apply BM25 to retrieve hard negatives, and use a teacher LLM to assign relevance scores. The resulting dataset is then distilled into an SLM, producing a compact relevance labeler. We evaluate our approach on a high-quality benchmark consisting of 923 enterprise query-document pairs annotated by trained human annotators, and show that the distilled SLM achieves agreement with human judgments on par with or better than the teacher LLM. Furthermore, our fine-tuned labeler substantially improves throughput, achieving 17 times increase while also being 19 times more cost-effective. This approach enables scalable and cost-effective relevance labeling for enterprise-scale retrieval applications, supporting rapid offline evaluation and iteration in real-world settings.

0 Citations

0 Influential

16.5 Altmetric

82.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!