2505.00949 May 02, 2025 cs.AI

Llama-Nemotron: 효율적인 추론 모델

Llama-Nemotron: Efficient Reasoning Models

M. Patwary

Citations: 8,244

h-index: 28

Brandon Norick

Citations: 3,748

h-index: 10

A. Bercovich

Citations: 1,984

h-index: 13

Itay Levy

NVIDIA

Citations: 325

h-index: 5

Izik Golan

Citations: 186

h-index: 4

Mohammad Dabbah

Citations: 130

h-index: 4

Ran El-Yaniv

Citations: 359

h-index: 9

Omri Puny

Citations: 570

h-index: 9

Ido Galil

Technion

Citations: 228

h-index: 7

Zach Moshe

Citations: 461

h-index: 8

Tomer Ronen

Citations: 251

h-index: 7

Najeeb Nabwani

Citations: 130

h-index: 4

Ido Shahaf

Citations: 395

h-index: 10

Oren Tropp

Citations: 130

h-index: 4

Ehud Karpas

Citations: 440

h-index: 7

Ran Zilberstein

Citations: 196

h-index: 6

Jiaqi Zeng

Citations: 1,033

h-index: 12

Soumye Singhal

NVIDIA

Citations: 475

h-index: 10

A. Bukharin

Citations: 1,771

h-index: 13

Yian Zhang

ML^2

Citations: 2,427

h-index: 11

Gerald Shen

Citations: 860

h-index: 11

Ameya Mahabaleshwarkar

Citations: 627

h-index: 11

Bilal Kartal

Citations: 256

h-index: 7

Yoshi Suhara

Citations: 712

h-index: 12

Olivier Delalleau

Citations: 1,039

h-index: 12

Zijia Chen

Citations: 466

h-index: 9

Zhilin Wang

Citations: 123

h-index: 4

David Mosallanezhad

Citations: 331

h-index: 8

Adi Renduchintala

Citations: 389

h-index: 9

Haifeng Qian

Citations: 242

h-index: 6

Dima Rekesh

Citations: 1,509

h-index: 9

Fei Jia

Citations: 406

h-index: 6

Somshubra Majumdar

Citations: 4,304

h-index: 24

V. Noroozi

Citations: 2,585

h-index: 19

W. Ahmad

Citations: 510

h-index: 10

Sean Narenthiran

Citations: 909

h-index: 11

Aleksander Ficek

Citations: 611

h-index: 10

Mehrzad Samadi

Citations: 378

h-index: 8

Jocelyn Huang

Citations: 1,212

h-index: 12

Siddhartha Jain

Citations: 576

h-index: 7

Igor Gitman

Citations: 2,652

h-index: 16

Ivan Moshkov

Citations: 937

h-index: 10

Wei Du

Citations: 634

h-index: 8

Shubham Toshniwal

Citations: 5,059

h-index: 25

George Armstrong

Citations: 195

h-index: 7

B. Kisačanin

Citations: 759

h-index: 14

Matvei Novikov

Citations: 378

h-index: 8

Daria Gitman

Citations: 467

h-index: 7

E. Bakhturina

Citations: 1,079

h-index: 15

Prasoon Varshney

Citations: 542

h-index: 8

Jane Scowcroft

Citations: 664

h-index: 9

John Kamalu

Citations: 749

h-index: 9

Dan Su

Citations: 589

h-index: 10

Kezhi Kong

Citations: 493

h-index: 9

Markus Kliegl

Citations: 932

h-index: 13

Rabeeh Karimi

Citations: 156

h-index: 2

Ying Lin

Citations: 520

h-index: 11

S. Satheesh

Citations: 50,350

h-index: 21

Jupinder Parmar

Citations: 598

h-index: 10

Pritam Gundecha

Citations: 983

h-index: 15

Joseph Jennings

Citations: 601

h-index: 9

Shrimai Prabhumoye

Citations: 500

h-index: 13

Syeda Nahida Akter

Citations: 466

h-index: 10

Abhinav Khattar

Citations: 445

h-index: 9

Deepak Narayanan

Citations: 705

h-index: 9

R. Waleffe

Citations: 822

h-index: 13

Jimmy Zhang

Citations: 680

h-index: 9

Bor-Yiing Su

Citations: 2,725

h-index: 12

Terry Kong

Citations: 302

h-index: 6

Parth Chadha

Citations: 301

h-index: 6

Sahil Jain

Citations: 308

h-index: 7

C. Harvey

Citations: 156

h-index: 2

Elad Segal

Citations: 4,187

h-index: 9

Jining Huang

Citations: 357

h-index: 6

Sergey Kashirsky

Citations: 223

h-index: 4

R. Mcqueen

Citations: 84

h-index: 1

Izzy Putterman

Citations: 165

h-index: 4

Arun Venkatesan

Citations: 159

h-index: 3

Sherry Wu

Citations: 122

h-index: 3

Manoj Kilaru

Citations: 157

h-index: 3

Anna Warno

Citations: 84

h-index: 1

Abhilash Somasamudramath

Citations: 92

h-index: 2

Sandip Bhaskar

Citations: 84

h-index: 1

Nave Assaf

Citations: 244

h-index: 6

Shahar Mor

Citations: 172

h-index: 5

Omer Ullman Argov

Citations: 172

h-index: 5

Scot Junkin

Citations: 83

h-index: 1

Pedro Larroy

Citations: 1,140

h-index: 4

Monika Katariya

Citations: 83

h-index: 1

Marco Rovinelli

Citations: 83

h-index: 1

Viji Balas

Citations: 83

h-index: 1

Anahita Bhiwandiwalla

Citations: 615

h-index: 12

Smita Ithape

Citations: 159

h-index: 4

Yuting Wu

Citations: 88

h-index: 2

S. Velury

Citations: 103

h-index: 3

Omri Almog

Citations: 144

h-index: 3

Joyjit Daw

Citations: 208

h-index: 6

Denys Fridman

Citations: 315

h-index: 4

Erick Galinkin

Citations: 386

h-index: 9

Michael Evans

Citations: 313

h-index: 7

K. Luna

Citations: 311

h-index: 7

Leon Derczynski

Citations: 502

h-index: 9

Nikki Pope

Citations: 232

h-index: 6

E. Long

Citations: 406

h-index: 9

Guillermo Siman

Citations: 83

h-index: 1

Tomasz Grzegorzek

Citations: 394

h-index: 5

Pablo Ribalta

Citations: 460

h-index: 4

Joey Conway

Citations: 332

h-index: 7

Trisha Saar

Citations: 281

h-index: 3

Ann Guan

Citations: 226

h-index: 5

Krzysztof Pawelec

Citations: 424

h-index: 7

Shyamala Prayaga

Citations: 165

h-index: 3

Oleksii Kuchaiev

NVIDIA

Citations: 6,373

h-index: 28

Boris Ginsburg

Citations: 503

h-index: 10

O. Olabiyi

Citations: 985

h-index: 15

Kari Briski

Citations: 300

h-index: 6

Jonathan Cohen

Citations: 83

h-index: 1

Bryan Catanzaro

Citations: 4,365

h-index: 33

Jonah Alben

Citations: 2,833

h-index: 5

Yonatan Geifman

Citations: 1,899

h-index: 10

Eric Chung

Citations: 359

h-index: 8

Guyue Huang

Citations: 899

h-index: 10

G. Lam

Citations: 104

h-index: 3

V. Nguyen

Citations: 90

h-index: 2

Andrew Wang

Citations: 201

h-index: 5

Seth Schneider

Citations: 133

h-index: 4

K. Ramamoorthy

Citations: 121

h-index: 4

O.O. Romanenko

Citations: 83

h-index: 1

Nicholas Edelman

Citations: 96

h-index: 3

T. Konuk

Citations: 466

h-index: 9

M. Dong

Citations: 97

h-index: 3

Muthukrishnan Subramaniam

Citations: 87

h-index: 2

우리는 뛰어난 추론 능력, 추론 효율성, 그리고 기업용 오픈 라이선스를 제공하는 이기종 추론 모델의 개방형 제품군인 Llama-Nemotron 모델 시리즈를 소개합니다. 이 제품군은 Nano(8B), Super(49B), Ultra(253B)의 세 가지 크기로 제공되며, DeepSeek-R1과 같은 최첨단 추론 모델과 경쟁할 수 있는 성능을 발휘하는 동시에 더 우수한 추론 처리량과 메모리 효율성을 제공합니다. 본 보고서에서는 가속화된 추론을 위해 Llama 3 모델 기반의 신경망 아키텍처 탐색, 지식 증류, 지속적인 사전 훈련을 활용하고, 이어서 지도 미세 조정과 대규모 강화 학습이라는 두 가지 주요 부분으로 구성된 추론 중심의 사후 훈련 단계를 거치는 이 모델들의 훈련 절차에 대해 논의합니다. Llama-Nemotron 모델은 동적 추론 토글을 지원하는 최초의 오픈 소스 모델로서, 사용자가 추론 중에 표준 채팅 모드와 추론 모드 사이를 전환할 수 있도록 합니다. 열린 연구를 더욱 지원하고 모델 개발을 촉진하기 위해 우리는 다음의 리소스를 제공합니다: 1. 상업적 이용이 허용되는 NVIDIA 오픈 모델 라이선스 계약하에 Llama-Nemotron 추론 모델(LN-Nano, LN-Super, LN-Ultra)을 공개합니다. 2. 전체 사후 훈련 데이터셋인 Llama-Nemotron-Post-Training-Dataset을 공개합니다. 3. 훈련 코드베이스인 NeMo, NeMo-Aligner, Megatron-LM을 공개합니다.

Original Abstract

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.

83 Citations

8 Influential

16.5 Altmetric

181.5 Score

Original PDF

AI Analysis

Korean Summary

이 논문은 NVIDIA가 Llama 3 모델을 기반으로 개발한 개방형 추론 모델 시리즈인 'Llama-Nemotron(Nano, Super, Ultra)'을 소개합니다. 이 모델들은 신경망 아키텍처 탐색(NAS), 지식 증류, 그리고 대규모 강화학습(RL)을 결합하여 DeepSeek-R1과 같은 최첨단 모델과 대등한 추론 성능을 유지하면서도 추론 효율성을 극대화했습니다. 특히 'Puzzle' 프레임워크를 통해 하드웨어 효율적인 구조로 변환되었으며, 사용자가 시스템 프롬프트를 통해 '상세한 사고(detailed thinking)' 모드를 켜거나 끌 수 있는 동적 토글 기능을 제공하여 유연한 비용 관리 및 응답 스타일 제어가 가능합니다.

Key Innovations

Puzzle 프레임워크 기반 NAS (블록 단위 증류 및 Attention 메커니즘 제거)
FFN Fusion (연속된 FFN 블록을 병합하여 레이어 깊이 및 지연 시간 감소)
동적 추론 토글 (Dynamic Reasoning Toggle) 기능
대규모 강화학습을 위한 GRPO(Group Relative Policy Optimization) 알고리즘 적용
FP8 생성 및 훈련/추론 병합을 통한 인프라 메모리 최적화

Learning & Inference Impact

추론 측면에서는 NAS와 FFN Fusion 기술을 통해 불필요한 연산을 줄여, LN-Ultra(253B) 모델이 단일 8xH100 노드에서 DeepSeek-R1보다 높은 처리량으로 구동될 수 있게 하였습니다. 학습 과정에서는 강력한 교사 모델의 추론 과정을 증류(SFT)한 후, 대규모 RL을 적용하여 학생 모델이 교사 모델의 성능 한계를 뛰어넘도록(Self-improvement) 설계되었습니다. 또한, 커리큘럼 학습 방식을 도입하여 학습 안정성을 높였습니다.

Technical Difficulty

고급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!