2604.03044v1 Apr 03, 2026 cs.CL

JoyAI-LLM Flash: 토큰 효율성을 향상시킨 중간 규모 LLM

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

Changjiang Jiang
Changjiang Jiang
Citations: 24
h-index: 3
Junwu Xiong
Junwu Xiong
Citations: 197
h-index: 2
An Zhang
An Zhang
Citations: 131
h-index: 6
Wei Liu
Wei Liu
Citations: 32
h-index: 3
Qi Yuan
Qi Yuan
Citations: 2
h-index: 1
Aichen Cai
Aichen Cai
Citations: 0
h-index: 0
Anyu Li
Anyu Li
Citations: 7
h-index: 2
Bo Zhang
Bo Zhang
Citations: 60
h-index: 3
Bo Cai
Bo Cai
Citations: 46
h-index: 4
Changlong Li
Changlong Li
Citations: 5
h-index: 1
Chang-Tien Lu
Chang-Tien Lu
Citations: 39
h-index: 3
Chao Xue
Chao Xue
Citations: 8
h-index: 2
Chao Liang
Chao Liang
Citations: 14
h-index: 2
Cheng Zhang
Cheng Zhang
Citations: 3
h-index: 1
Dong Liu
Dong Liu
Citations: 0
h-index: 0
Fei Wang
Fei Wang
Citations: 74
h-index: 2
Guoqiang Huang
Guoqiang Huang
Citations: 93
h-index: 2
Hai-Jian Ke
Hai-Jian Ke
Citations: 3
h-index: 1
Han Lin
Han Lin
Citations: 87
h-index: 3
Hao Wang
Hao Wang
Citations: 53
h-index: 3
Jingyuan Miao
Jingyuan Miao
Citations: 3
h-index: 1
Jiachen Zhang
Jiachen Zhang
Citations: 23
h-index: 2
Jia Shi
Jia Shi
Citations: 1
h-index: 1
Jifeng Zhu
Jifeng Zhu
Citations: 159
h-index: 4
Jingjing Qian
Jingjing Qian
Citations: 11
h-index: 2
Jun Luo
Jun Luo
Citations: 24
h-index: 3
L. So
L. So
Citations: 0
h-index: 0
Liang Huang
Liang Huang
Citations: 54
h-index: 2
M. Ke
M. Ke
Citations: 5
h-index: 1
Mingyan Li
Mingyan Li
Citations: 39
h-index: 1
Panfeng Shi
Panfeng Shi
Citations: 12
h-index: 2
Penghao Hao
Penghao Hao
Citations: 4
h-index: 1
Qi Wang
Qi Wang
Citations: 5
h-index: 1
Qiannan Lai
Qiannan Lai
Citations: 0
h-index: 0
Qingyu Yin
Qingyu Yin
Citations: 42
h-index: 4
Qiong Cao
Qiong Cao
Citations: 73
h-index: 5
Qixiang Wang
Qixiang Wang
Citations: 380
h-index: 8
Rong Bian
Rong Bian
Citations: 8
h-index: 2
Rongduo Han
Rongduo Han
Citations: 25
h-index: 3
Shaoqiang Zheng
Shaoqiang Zheng
Citations: 62
h-index: 5
Shi Hu
Shi Hu
Citations: 22
h-index: 3
S. Suo
S. Suo
Citations: 4
h-index: 2
Shijie Ren
Shijie Ren
Citations: 38
h-index: 4
Shijin Zhang
Shijin Zhang
Citations: 28
h-index: 4
Shiying Fan
Shiying Fan
Citations: 19
h-index: 2
Shuai Xie
Shuai Xie
Citations: 14
h-index: 2
Tianyi Zhang
Tianyi Zhang
Citations: 378
h-index: 2
Wen-Tao Tan
Wen-Tao Tan
Citations: 9
h-index: 2
Xianghan Meng
Xianghan Meng
Citations: 21
h-index: 3
Xiaodong He
Xiaodong He
Citations: 10
h-index: 1
X. Pan
X. Pan
Citations: 11
h-index: 2
Xiran Wang
Xiran Wang
Citations: 215
h-index: 3
Xuyang Peng
Xuyang Peng
Citations: 14
h-index: 2
Ya Zhang
Ya Zhang
Citations: 55
h-index: 2
Yang Liu
Yang Liu
Citations: 15
h-index: 2
Yang Duan
Yang Duan
Citations: 12
h-index: 2
Yanxu Chen
Yanxu Chen
Citations: 60
h-index: 2
Yichen Gong
Yichen Gong
Citations: 411
h-index: 4
Yidan Huang
Yidan Huang
Citations: 4
h-index: 1
Yifei Liu
Yifei Liu
Citations: 38
h-index: 3
Yinhao Bai
Yinhao Bai
Citations: 156
h-index: 4
Yongqiang Liu
Yongqiang Liu
Citations: 55
h-index: 2
Yue Zhang
Yue Zhang
Citations: 37
h-index: 4
Yuqi Zhang
Yuqi Zhang
Citations: 136
h-index: 3
Zerui Xie
Zerui Xie
Citations: 0
h-index: 0
Zhenfang Wang
Zhenfang Wang
Citations: 1
h-index: 1
Z. Shen
Z. Shen
Citations: 12
h-index: 2
Zheyuan Liu
Zheyuan Liu
Citations: 11
h-index: 1
Zhuwei Zeng
Zhuwei Zeng
Citations: 23
h-index: 1

본 논문에서는 효율적인 Mixture-of-Experts (MoE) 언어 모델인 JoyAI-LLM Flash를 소개합니다. 이 모델은 500억 개 이하의 파라미터 규모에서 강력한 성능과 토큰 효율성 간의 균형을 재정의하는 것을 목표로 합니다. JoyAI-LLM Flash는 20조 개의 토큰으로 구성된 방대한 데이터셋으로 사전 훈련되었으며, 이후 지도 학습 미세 조정 (SFT), 직접 선호도 최적화 (DPO), 그리고 다양한 환경에서의 대규모 강화 학습 (RL)을 포함하는 엄격한 후처리 파이프라인을 통해 추가적으로 최적화되었습니다. 토큰 효율성을 향상시키기 위해, JoyAI-LLM Flash는 '사고'와 '비사고' 인지 모드를 전략적으로 균형 있게 조절하며, 섬유 이론에서 영감을 받은 새로운 RL 알고리즘인 FiberPO를 도입했습니다. FiberPO는 신뢰 영역 유지 문제를 전역 및 지역 구성 요소로 분해하여 LLM 정책 최적화를 위한 통합적인 다중 스케일 안정성 제어를 제공합니다. 아키텍처의 희소성을 향상시키기 위해, 이 모델은 총 480억 개의 파라미터를 포함하지만, 순방향 계산 시에는 27억 개의 파라미터만 활성화하여, 동시대의 유사 규모의 선도적인 모델보다 훨씬 높은 희소성 비율을 달성합니다. 또한, 추론 처리량을 더욱 향상시키기 위해, 밀집 다중 토큰 예측 (MTP) 및 양자화 인식 훈련 (QAT)을 통합한 공동 훈련-추론 설계 방식을 채택했습니다. JoyAI-LLM-48B-A3B Base 모델과 그 후처리된 변형 모델의 체크포인트를 Hugging Face에 공개하여 오픈 소스 커뮤니티를 지원합니다.

Original Abstract

We introduce JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model designed to redefine the trade-off between strong performance and token efficiency in the sub-50B parameter regime. JoyAI-LLM Flash is pretrained on a massive corpus of 20 trillion tokens and further optimized through a rigorous post-training pipeline, including supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and large-scale reinforcement learning (RL) across diverse environments. To improve token efficiency, JoyAI-LLM Flash strategically balances \emph{thinking} and \emph{non-thinking} cognitive modes and introduces FiberPO, a novel RL algorithm inspired by fibration theory that decomposes trust-region maintenance into global and local components, providing unified multi-scale stability control for LLM policy optimization. To enhance architectural sparsity, the model comprises 48B total parameters while activating only 2.7B parameters per forward pass, achieving a substantially higher sparsity ratio than contemporary industry leading models of comparable scale. To further improve inference throughput, we adopt a joint training-inference co-design that incorporates dense Multi-Token Prediction (MTP) and Quantization-Aware Training (QAT). We release the checkpoints for both JoyAI-LLM-48B-A3B Base and its post-trained variants on Hugging Face to support the open-source community.

0 Citations
0 Influential
4 Altmetric
20.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!