2601.05047v3 Jan 08, 2026 cs.AR

대규모 언어 모델 추론 하드웨어의 과제 및 연구 방향

Challenges and Research Directions for Large Language Model Inference Hardware

Citations: 15

h-index: 2

Citations: 1,957

h-index: 7

대규모 언어 모델(LLM) 추론은 어렵습니다. 기본이 되는 트랜스포머 모델의 자기 회귀 디코딩 과정은 LLM 추론을 훈련과 근본적으로 다르게 만듭니다. 최근 인공지능 트렌드에 의해 더욱 심화된 주요 과제는 컴퓨팅 성능보다는 메모리와 상호 연결입니다. 이러한 과제에 대응하기 위해, 우리는 다음과 같은 네 가지 아키텍처 연구 기회를 제시합니다. 첫째, HBM과 유사한 대역폭을 갖춘 고대역폭 플래시 메모리를 사용하여 10배의 메모리 용량을 확보하는 방법입니다. 둘째, 높은 메모리 대역폭을 위한 메모리-로직 3차원 스태킹 및 메모리 근처 처리 기술입니다. 셋째, 통신 속도를 향상시키기 위한 저지연 상호 연결 기술입니다. 우리의 주요 관심사는 데이터센터 AI이지만, 이러한 기술이 모바일 장치에도 적용될 수 있는지 검토합니다.

Original Abstract

Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and interconnect rather than compute. To address these challenges, we highlight four architecture research opportunities: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and 3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup communication. While our focus is datacenter AI, we also review their applicability for mobile devices.

8 Citations

1 Influential

3.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!