2602.10016v2 Feb 10, 2026 cs.IR

쿤룬: 통일된 아키텍처 설계를 통해 대규모 추천 시스템의 확장 법칙을 확립하다

Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

Jiaqi Xu

Citations: 766

h-index: 16

Bojian Hou

Citations: 4

h-index: 2

Xiaolong Liu

Citations: 36

h-index: 4

Xiaoyi Liu

Citations: 40

h-index: 4

Yasmine Badr

Citations: 29

h-index: 2

M. Hang

Citations: 54

h-index: 5

Sudhanshu Chanpuriya

Citations: 161

h-index: 6

Jun Zhou

Citations: 108

h-index: 5

Yuhang Yang

Citations: 239

h-index: 9

Han Xu

Citations: 56

h-index: 3

Qiuling Suo

Citations: 19

h-index: 2

Laming Chen

Citations: 34

h-index: 3

Yuxi Hu

Citations: 279

h-index: 4

Jiasheng Zhang

Citations: 90

h-index: 5

H. Xiong

Citations: 3

h-index: 1

Yuzhen Huang

Citations: 89

h-index: 5

Yue Dong

Citations: 28

h-index: 2

Yi Yang

Citations: 715

h-index: 13

Shuo Chang

Citations: 39

h-index: 3

Xiaorui Gan

Citations: 20

h-index: 2

Wenlin Chen

Citations: 144

h-index: 2

Santanu Kolay

Citations: 222

h-index: 8

D. Liu

Citations: 10

h-index: 2

Jade Nie

Citations: 639

h-index: 8

Chun-Che Yang

National Yang Ming Chiao Tung University

Citations: 36

h-index: 3

Ellie Wen

Citations: 479

h-index: 9

Jiyan Yang

Citations: 100

h-index: 6

Huayu Li

Citations: 291

h-index: 6

Chaochao Chen

Citations: 28

h-index: 2

모델 성능과 컴퓨팅 투자 간의 관계를 규정하는 예측 가능한 확장 법칙을 도출하는 것은 대규모 추천 시스템의 설계 및 자원 할당에 매우 중요합니다. 이러한 법칙은 대규모 언어 모델에 대해서는 이미 확립되었지만, 특히 사용자 기록과 컨텍스트 특징을 모두 처리하는 추천 시스템에서는 여전히 어려운 과제입니다. 우리는 예측 가능한 거듭제곱 법칙 확장을 방해하는 주요 요인으로 모델 FLOPs 활용률(MFU)이 낮은 비효율적인 모듈과 최적화되지 않은 자원 할당을 지목합니다. 우리는 모델 효율성과 자원 할당을 체계적으로 개선하는 확장 가능한 아키텍처인 쿤룬(Kunlun)을 소개합니다. 우리의 저수준 최적화 기술에는 일반화된 내적 주의(GDPA), 계층적 시드 풀링(HSP), 슬라이딩 윈도우 주의(Sliding Window Attention)가 포함됩니다. 우리의 고수준 혁신에는 연산 건너뛰기(CompSkip)와 이벤트 레벨 개인화가 특징입니다. 이러한 발전은 NVIDIA B200 GPU에서 MFU를 17%에서 37%로 증가시키고, 최첨단 방법보다 확장 효율을 두 배로 향상시킵니다. 쿤룬은 현재 메타 광고의 주요 모델에 배포되어 상당한 생산성 향상을 가져왔습니다.

Original Abstract

Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from inefficient modules with low Model FLOPs Utilization (MFU) and suboptimal resource allocation. We introduce Kunlun, a scalable architecture that systematically improves model efficiency and resource allocation. Our low-level optimizations include Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and Sliding Window Attention. Our high-level innovations feature Computation Skip (CompSkip) and Event-level Personalization. These advances increase MFU from 17% to 37% on NVIDIA B200 GPUs and double scaling efficiency over state-of-the-art methods. Kunlun is now deployed in major Meta Ads models, delivering significant production impact.

2 Citations

0 Influential

8 Altmetric

42.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!