2602.11287v2 Feb 11, 2026 cs.LG

언어 모델 추론을 위한 HiFloat4 형식

HiFloat4 Format for Language Model Inference

Jing Huang

Citations: 44

h-index: 3

Yun Xu

Citations: 381

h-index: 5

Ziwei Yu

Citations: 8

h-index: 2

Xin Wang

Citations: 8

h-index: 2

Yuanyong Luo

Citations: 12

h-index: 2

Yu Cheng

Citations: 447

h-index: 5

Kai Zhang

Citations: 60

h-index: 4

Ke Hong

Citations: 653

h-index: 11

Xin Ma

Citations: 14

h-index: 3

Anping Tong

Citations: 6

h-index: 1

Guipeng Hu

Citations: 19

h-index: 3

Mehran Taghian

Citations: 85

h-index: 4

Peng Wu

Citations: 1,269

h-index: 4

Guanglin Li

Citations: 13

h-index: 2

Yunke Peng

Citations: 6

h-index: 1

Tianchi Hu

Citations: 19

h-index: 3

Minqi Chen

Citations: 128

h-index: 4

M. B. Mi

Citations: 1,114

h-index: 11

Hu Liu

Citations: 16

h-index: 2

Xiping Zhou

Citations: 375

h-index: 4

Junsong Wang

Citations: 19

h-index: 3

Qiang Lin

Citations: 92

h-index: 4

Heng Liao

Citations: 71

h-index: 4

본 논문에서는 딥 러닝에 최적화된 블록 부동 소수점 데이터 형식인 HiFloat4 (HiF4)를 소개합니다. 각 HiF4 단위는 32비트의 공유 스케일링 메타데이터와 함께 64개의 4비트 요소를 포함하며, 값당 평균 4.5비트를 사용합니다. 이 메타데이터는 3단계 스케일링 계층 구조를 지정하여 그룹 간 및 그룹 내 동적 범위를 포착하고 표현 공간 활용도를 향상시킵니다. 또한, 큰 64개 요소 그룹 크기는 행렬 곱셈을 고정 소수점 방식으로 실행할 수 있도록 하여 하드웨어 면적과 전력 소비를 크게 줄입니다. 제안된 형식을 평가하기 위해 LLaMA, Qwen, Mistral, DeepSeek-V3.1 및 LongCat을 포함한 여러 언어 모델에 대한 추론 실험을 수행했습니다. 결과는 HiF4가 여러 모델 및 다양한 하위 작업에서 최첨단 NVFP4 형식보다 높은 평균 정확도를 달성한다는 것을 보여줍니다.

Original Abstract

This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.

6 Citations

0 Influential

5.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!