2602.02988v1 Feb 03, 2026 cs.LG

NLI: 비균일 선형 보간 근사법을 이용한 비선형 연산 최적화 및 효율적인 LLM 추론

NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference

Jiangyong Yu

Citations: 355

h-index: 8

Xiaomeng Han

Citations: 9

h-index: 2

Xing Hu

Citations: 290

h-index: 9

Chen Xu

Citations: 223

h-index: 7

Zhe Jiang

Citations: 126

h-index: 4

Dawei Yang

Citations: 373

h-index: 10

대규모 언어 모델(LLM)은 다양한 작업에서 뛰어난 성능을 보이지만, 메모리 사용량과 계산 비용으로 인해 활용에 제약이 따르는 경우가 많습니다. 기존 연구에서는 선형 레이어를 압축하고 가속화하는 데 상당한 진전을 이루었지만, SiLU, RMSNorm, Softmax와 같은 비선형 레이어는 여전히 고정밀 부동소수점 연산에 크게 의존합니다. 본 논문에서는 교정 과정이 필요 없고, 동적 프로그래밍을 통해 최적화되며, 하드웨어 친화적인 프레임워크인 비균일 선형 보간(NLI)을 제안합니다. NLI는 다양한 비선형 함수를 효율적으로 근사할 수 있으며, 거의 정확도 손실 없이 LLM 및 기타 심층 신경망에 원활하게 통합될 수 있습니다. NLI는 절단점 선택을 동적 프로그래밍 문제로 재구성하여, 벨만 최적성 원리를 통해 O(MxN2) 시간 내에 전역적으로 최소의 보간 오류를 달성합니다. 또한, NLI 알고리즘을 기반으로 플러그 앤 플레이 방식으로 사용할 수 있는 범용 비선형 연산 장치를 설계하고 구현했습니다. 하드웨어 실험 결과, NLI 엔진은 최첨단 설계에 비해 4배 이상의 계산 효율성 향상을 보여줍니다.

Original Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, but their deployment is often constrained by substantial memory footprints and computational costs. While prior work has achieved significant progress in compressing and accelerating linear layers, nonlinear layers-such as SiLU, RMSNorm, and Softmax-still heavily depend on high-precision floating-point operations. In this paper, we propose a calibration-free, dynamic-programming-optimal, and hardware-friendly framework called Non-uniform Linear Interpolation (NLI). NLI is capable of efficiently approximating a variety of nonlinear functions, enabling seamless integration into LLMs and other deep neural networks with almost no loss in accuracy. NLI ingeniously recasts cutpoint selection as a dynamic-programming problem, achieving the globally minimal interpolation error in O(MxN2) time via Bellman's optimality principle. Based on the NLI algorithm, we also design and implement a plug-and-play universal nonlinear computation unit. Hardware experiments demonstrate that the NLI Engine achieves more than 4x improvement in computational efficiency compared to the state-of-the-art designs.

1 Citations

0 Influential

5 Altmetric

26.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!