2603.03770v1 Mar 04, 2026 cs.IR

모든 후보자가 동등하게 가치 있는 것은 아니다: 추천 시스템에서 이질성을 고려한 사전 순위 결정 방법

Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems

Pengfei Tong

Citations: 26

h-index: 1

Chenwei Zhang

Citations: 108

h-index: 4

Bohao Wang

Citations: 133

h-index: 7

Qi Pi

Citations: 2,093

h-index: 6

Zuotao Liu

Citations: 96

h-index: 4

Siyuan Chen

Citations: 44

h-index: 4

Pixu Li

Citations: 11

h-index: 2

대부분의 대규모 추천 시스템은 검색, 사전 순위 결정, 순위 결정 및 재순위 결정을 포함하는 다단계 프로세스를 따릅니다. 사전 순위 결정 단계에서 발생하는 주요 과제는, 거칠게 분류된 검색 결과, 세밀하게 분류된 순위 신호 및 노출 피드백으로부터 샘플링된 학습 데이터의 이질성입니다. 우리의 분석에 따르면, 이질적인 샘플을 무차별적으로 혼합하는 기존의 사전 순위 결정 방법은 기울기 충돌 문제를 겪습니다. 즉, 어려운 샘플이 학습을 지배하고 쉬운 샘플은 활용도가 낮아 최적의 성능을 달성하지 못합니다. 또한, 모든 샘플에 대해 모델 복잡도를 균일하게 조정하는 일반적인 방법은 효율적이지 않습니다. 왜냐하면 쉬운 경우에 불필요한 계산 비용을 사용하고, 그에 상응하는 성능 향상 없이 학습 속도를 늦추기 때문입니다. 이러한 제한 사항을 해결하기 위해, 본 논문에서는 이질성을 고려한 적응형 사전 순위 결정(HAP)이라는 통합 프레임워크를 제안합니다. HAP은 충돌에 민감한 샘플링과 맞춤형 손실 설계 방법을 통해 기울기 충돌을 완화하고, 동시에 후보자에게 할당되는 계산 자원을 적응적으로 조정합니다. 구체적으로, HAP은 쉬운 샘플과 어려운 샘플을 분리하여 각 부분집합을 별도의 최적화 경로로 안내합니다. 이러한 분리를 바탕으로, HAP은 먼저 모든 후보자에게 가벼운 모델을 적용하여 효율적인 범위를 확보하고, 추가적으로 강력한 모델을 어려운 샘플에 적용하여 정확도를 유지하면서 비용을 절감합니다. 이러한 접근 방식은 사전 순위 결정의 효율성을 향상시킬 뿐만 아니라, 산업용 추천 시스템의 확장 전략에 대한 실용적인 관점을 제공합니다. HAP은 Toutiao의 실제 시스템에서 9개월 동안 사용되었으며, 사용자 앱 사용 시간과 활성 일수가 각각 최대 0.4% 및 0.05% 향상되었으며, 추가적인 계산 비용은 발생하지 않았습니다. 또한, 사전 순위 결정에서 발생하는 소스 기반의 후보자 이질성을 체계적으로 연구할 수 있도록 대규모 산업용 하이브리드 샘플 데이터 세트를 공개합니다.

Original Abstract

Most large-scale recommender systems follow a multi-stage cascade of retrieval, pre-ranking, ranking, and re-ranking. A key challenge at the pre-ranking stage arises from the heterogeneity of training instances sampled from coarse-grained retrieval results, fine-grained ranking signals, and exposure feedback. Our analysis reveals that prevailing pre-ranking methods, which indiscriminately mix heterogeneous samples, suffer from gradient conflicts: hard samples dominate training while easy ones remain underutilized, leading to suboptimal performance. We further show that the common practice of uniformly scaling model complexity across all samples is inefficient, as it overspends computation on easy cases and slows training without proportional gains. To address these limitations, this paper presents Heterogeneity-Aware Adaptive Pre-ranking (HAP), a unified framework that mitigates gradient conflicts through conflict-sensitive sampling coupled with tailored loss design, while adaptively allocating computational budgets across candidates. Specifically, HAP disentangles easy and hard samples, directing each subset along dedicated optimization paths. Building on this separation, it first applies lightweight models to all candidates for efficient coverage, and further engages stronger models on the hard ones, maintaining accuracy while reducing cost. This approach not only improves pre-ranking effectiveness but also provides a practical perspective on scaling strategies in industrial recommender systems. HAP has been deployed in the Toutiao production system for 9 months, yielding up to 0.4% improvement in user app usage duration and 0.05% in active days, without additional computational cost. We also release a large-scale industrial hybrid-sample dataset to enable the systematic study of source-driven candidate heterogeneity in pre-ranking.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!