2605.03667v1 May 05, 2026 cs.LG

ELAS: 2:4 활성화 희소성을 이용한 저랭크 대규모 언어 모델의 효율적인 사전 훈련

ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity

Lu Yin

Citations: 23

h-index: 3

Shiwei Liu

Citations: 1,675

h-index: 20

Jiaxi Li

Citations: 12

h-index: 2

Li Shen

Citations: 77

h-index: 4

Jinjin Xu

Citations: 38

h-index: 3

Yuhui Liu

Citations: 0

h-index: 0

Wenwu Wang

Citations: 18

h-index: 3

Xilu Wang

Citations: 1,526

h-index: 13

대규모 언어 모델(LLM)은 놀라운 성능을 보여주지만, 훈련 과정에서의 막대한 계산량은 널리 활용되는 데 있어 중요한 장애물이 됩니다. 최근에는 훈련 메모리 사용량을 크게 줄일 수 있는 저랭크 훈련 방식이 주목받고 있습니다. 또한, NVIDIA GPU의 2:4 구조화된 희소 형식 지원을 활용하기 위해 가중치와 활성화 값에 2:4 구조화된 희소성을 적용하는 것이 유망한 방향으로 떠오르고 있습니다. 그러나 기존의 저랭크 방법은 종종 활성화 행렬을 완전 랭크로 유지하여 메모리 사용량이 많아지고, 대규모 배치 훈련 시 처리량을 제한합니다. 또한, 가중치에 직접 희소성을 적용하면 성능 저하가 발생할 수 있습니다. 본 논문에서는 LLM의 효율적인 사전 훈련을 위해, 2:4 활성화 희소성을 이용한 저랭크 모델의 새로운 프레임워크인 ELAS(Efficient pre-training of Low-rank LLMs via 2:4 Activation Sparsity)를 제안합니다. ELAS는 저랭크 모델의 피드 포워드 네트워크에 제곱 ReLU 활성화 함수를 적용하고, 제곱 ReLU 연산 후 활성화 값에 2:4 구조화된 희소성을 적용합니다. 60M에서 1B 파라미터까지 다양한 LLaMA 모델에 ELAS를 적용하여 사전 훈련 실험을 수행했습니다. 실험 결과, ELAS는 2:4 활성화 희소성을 적용한 후에도 성능 저하를 최소화하면서 훈련 및 추론 속도를 향상시켰습니다. 또한, ELAS는 특히 대규모 배치 크기에서 활성화 메모리 오버헤드를 줄입니다. 코드 및 관련 정보는 ELAS Repo에서 확인할 수 있습니다.

Original Abstract

Large Language Models (LLMs) have achieved remarkable capabilities, but their immense computational demands during training remain a critical bottleneck for widespread adoption. Low-rank training has received attention in recent years due to its ability to significantly reduce training memory usage. Meanwhile, applying 2:4 structured sparsity to weights and activations to leverage NVIDIA GPU support for 2:4 structured sparse format has become a promising direction. However, existing low-rank methods often leave activation matrices in full-rank, which dominates memory consumption and limits throughput during large-batch training. Furthermore, directly applying sparsity to weights often leads to non-negligible performance degradation. To achieve efficient pre-training of LLMs, this paper proposes ELAS: Efficient pre-training of Low-rank LLMs via 2:4 Activation Sparsity, a novel framework for low-rank models via 2:4 activation sparsity. ELAS applies squared ReLU activation functions to the feed-forward networks in low-rank models and implements 2:4 structured sparsity on the activations after the squared ReLU operation. We evaluated ELAS through pre-training experiments on LLaMA models ranging from 60M to 1B parameters. The results demonstrate that ELAS maintains performance with minimal degradation after applying 2:4 activation sparsity, while achieving training and inference acceleration. Moreover, ELAS reduces activation memory overhead, particularly with large batch sizes. Code is available at ELAS Repo.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!