2604.03044v1 Apr 03, 2026 cs.CL

JoyAI-LLM Flash: 토큰 효율성을 향상시킨 중간 규모 LLM

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

Changjiang Jiang

Citations: 24

h-index: 3

Junwu Xiong

Citations: 197

h-index: 2

An Zhang

Citations: 131

h-index: 6

Wei Liu

Citations: 32

h-index: 3

Qi Yuan

Citations: 2

h-index: 1

Aichen Cai

Citations: 0

h-index: 0

Anyu Li

Citations: 7

h-index: 2

Bo Zhang

Citations: 60

h-index: 3

Bo Cai

Citations: 46

h-index: 4

Changlong Li

Citations: 5

h-index: 1

Chang-Tien Lu

Citations: 39

h-index: 3

Chao Xue

Citations: 8

h-index: 2

Chao Liang

Citations: 14

h-index: 2

Cheng Zhang

Citations: 3

h-index: 1

Dong Liu

Citations: 0

h-index: 0

Fei Wang

Citations: 74

h-index: 2

Guoqiang Huang

Citations: 93

h-index: 2

Hai-Jian Ke

Citations: 3

h-index: 1

Han Lin

Citations: 87

h-index: 3

Hao Wang

Citations: 53

h-index: 3

Jingyuan Miao

Citations: 3

h-index: 1

Jiachen Zhang

Citations: 23

h-index: 2

Jia Shi

Citations: 1

h-index: 1

Jifeng Zhu

Citations: 159

h-index: 4

Jingjing Qian

Citations: 11

h-index: 2

Jun Luo

Citations: 24

h-index: 3

L. So

Citations: 0

h-index: 0

Liang Huang

Citations: 54

h-index: 2

M. Ke

Citations: 5

h-index: 1

Mingyan Li

Citations: 39

h-index: 1

Panfeng Shi

Citations: 12

h-index: 2

Penghao Hao

Citations: 4

h-index: 1

Qi Wang

Citations: 5

h-index: 1

Qiannan Lai

Citations: 0

h-index: 0

Qingyu Yin

Citations: 42

h-index: 4

Qiong Cao

Citations: 73

h-index: 5

Qixiang Wang

Citations: 380

h-index: 8

Rong Bian

Citations: 8

h-index: 2

Rongduo Han

Citations: 25

h-index: 3

Shaoqiang Zheng

Citations: 62

h-index: 5

Shi Hu

Citations: 22

h-index: 3

S. Suo

Citations: 4

h-index: 2

Shijie Ren

Citations: 38

h-index: 4

Shijin Zhang

Citations: 28

h-index: 4

Shiying Fan

Citations: 19

h-index: 2

Shuai Xie

Citations: 14

h-index: 2

Tianyi Zhang

Citations: 378

h-index: 2

Wen-Tao Tan

Citations: 9

h-index: 2

Xianghan Meng

Citations: 21

h-index: 3

Xiaodong He

Citations: 10

h-index: 1

X. Pan

Citations: 11

h-index: 2

Xiran Wang

Citations: 215

h-index: 3

Xuyang Peng

Citations: 14

h-index: 2

Ya Zhang

Citations: 55

h-index: 2

Yang Liu

Citations: 15

h-index: 2

Yang Duan

Citations: 12

h-index: 2

Yanxu Chen

Citations: 60

h-index: 2

Yichen Gong

Citations: 411

h-index: 4

Yidan Huang

Citations: 4

h-index: 1

Yifei Liu

Citations: 38

h-index: 3

Yinhao Bai

Citations: 156

h-index: 4

Yongqiang Liu

Citations: 55

h-index: 2

Yue Zhang

Citations: 37

h-index: 4

Yuqi Zhang

Citations: 136

h-index: 3

Zerui Xie

Citations: 0

h-index: 0

Zhenfang Wang

Citations: 1

h-index: 1

Z. Shen

Citations: 12

h-index: 2

Zheyuan Liu

Citations: 11

h-index: 1

Zhuwei Zeng

Citations: 23

h-index: 1

본 논문에서는 효율적인 Mixture-of-Experts (MoE) 언어 모델인 JoyAI-LLM Flash를 소개합니다. 이 모델은 500억 개 이하의 파라미터 규모에서 강력한 성능과 토큰 효율성 간의 균형을 재정의하는 것을 목표로 합니다. JoyAI-LLM Flash는 20조 개의 토큰으로 구성된 방대한 데이터셋으로 사전 훈련되었으며, 이후 지도 학습 미세 조정 (SFT), 직접 선호도 최적화 (DPO), 그리고 다양한 환경에서의 대규모 강화 학습 (RL)을 포함하는 엄격한 후처리 파이프라인을 통해 추가적으로 최적화되었습니다. 토큰 효율성을 향상시키기 위해, JoyAI-LLM Flash는 '사고'와 '비사고' 인지 모드를 전략적으로 균형 있게 조절하며, 섬유 이론에서 영감을 받은 새로운 RL 알고리즘인 FiberPO를 도입했습니다. FiberPO는 신뢰 영역 유지 문제를 전역 및 지역 구성 요소로 분해하여 LLM 정책 최적화를 위한 통합적인 다중 스케일 안정성 제어를 제공합니다. 아키텍처의 희소성을 향상시키기 위해, 이 모델은 총 480억 개의 파라미터를 포함하지만, 순방향 계산 시에는 27억 개의 파라미터만 활성화하여, 동시대의 유사 규모의 선도적인 모델보다 훨씬 높은 희소성 비율을 달성합니다. 또한, 추론 처리량을 더욱 향상시키기 위해, 밀집 다중 토큰 예측 (MTP) 및 양자화 인식 훈련 (QAT)을 통합한 공동 훈련-추론 설계 방식을 채택했습니다. JoyAI-LLM-48B-A3B Base 모델과 그 후처리된 변형 모델의 체크포인트를 Hugging Face에 공개하여 오픈 소스 커뮤니티를 지원합니다.

Original Abstract

We introduce JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model designed to redefine the trade-off between strong performance and token efficiency in the sub-50B parameter regime. JoyAI-LLM Flash is pretrained on a massive corpus of 20 trillion tokens and further optimized through a rigorous post-training pipeline, including supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and large-scale reinforcement learning (RL) across diverse environments. To improve token efficiency, JoyAI-LLM Flash strategically balances \emph{thinking} and \emph{non-thinking} cognitive modes and introduces FiberPO, a novel RL algorithm inspired by fibration theory that decomposes trust-region maintenance into global and local components, providing unified multi-scale stability control for LLM policy optimization. To enhance architectural sparsity, the model comprises 48B total parameters while activating only 2.7B parameters per forward pass, achieving a substantially higher sparsity ratio than contemporary industry leading models of comparable scale. To further improve inference throughput, we adopt a joint training-inference co-design that incorporates dense Multi-Token Prediction (MTP) and Quantization-Aware Training (QAT). We release the checkpoints for both JoyAI-LLM-48B-A3B Base and its post-trained variants on Hugging Face to support the open-source community.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!