2604.17172v1 Apr 19, 2026 cs.DC

CCCL: GPU 내 압축 기반 병렬 통신 라이브러리

CCCL: In-GPU Compression-Coupled Collective Communication

Ziming Mao

Citations: 113

h-index: 6

ChonLam Lao

Citations: 33

h-index: 3

Zhiying Xu

Citations: 13

h-index: 2

Delong Meng

Citations: 14

h-index: 2

Jun Wu

Citations: 9

h-index: 2

Ion Stoica

Citations: 681

h-index: 6

Shuang Ma

Citations: 587

h-index: 6

Zhuang Wang

Citations: 248

h-index: 7

Jiangjie Zhen

Citations: 172

h-index: 2

Yida Wang

Citations: 32

h-index: 4

Yang Zhou

Citations: 71

h-index: 4

대규모 언어 모델(LLM) 워크로드에서 병렬 통신은 상당한 오버헤드를 발생시킵니다. 애플리케이션 수준에서 통신과 연산을 겹쳐서 처리하는 것이 일반적인 전략이지만, 이는 종종 상당한 코드 수정이 필요하며 많은 워크로드(예: 텐서 병렬 및 전문가 병렬)에 적합하지 않습니다. 본 논문에서는 CCCL이라는 압축 기반 병렬 통신 라이브러리를 제안합니다. CCCL은 allreduce, alltoall, send/recv 연산을 지원하며, 사용자 측 코드 변경 없이 기존 애플리케이션에 쉽게 통합될 수 있습니다. CCCL은 메모리 접근을 최소화하기 위해 압축 커널을 밀접하게 통합하고, NCCL과 통합하여 데이터 병합 단계를 제거하여 통신 속도를 향상시켰습니다(최대 3배의 NVLink 대역폭). 실험 결과, CCCL은 vLLM PD 분산 워크로드에서 최대 10.1%, 마이크로 벤치마크 성능에서 최대 30%의 처리량 향상을 보였습니다.

Original Abstract

Collective communication incurs significant overhead in LLM workloads. Although overlapping communication with computation in application-level is a common strategy, it often requires substantial code modifications and is impractical for many workloads (e.g., tensor and expert parallelism). We present CCCL, a built-in compression-based collective communication library that supports operations such as allreduce, alltoall, and send/recv without requiring any user-side changes, thereby enabling seamless adoption in existing applications. CCCL tightly fuses compression kernels to minimize memory accesses and integrates with NCCL to eliminate the data coalescing stage, making it fast enough (up to 3x NVLink bandwidth) to sustain communication. Our evaluation shows that CCCL improves end-to-end throughput in vLLM PD disaggregation workloads by up to 10.1% and microbenchmark throughput by up to 30%.

1 Citations

0 Influential

3.5 Altmetric

18.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!