2605.05365v1 May 06, 2026 cs.AI

ZAYA1-8B 기술 보고서

ZAYA1-8B Technical Report

P. Yuvraj

Citations: 16,340

h-index: 8

Beren Millidge

Citations: 2,358

h-index: 26

Robert Washbourne

Citations: 2

h-index: 1

Rishi Iyer

Citations: 9

h-index: 2

Tomas Figliolia

Citations: 16

h-index: 2

Henry Zheng

Citations: 14

h-index: 3

Ryan Lorig-Roach

Citations: 1,677

h-index: 9

Su Yang

Citations: 44

h-index: 3

Quentin Anthony

Citations: 210

h-index: 5

Yury Tokpanov

Citations: 209

h-index: 5

Xiao Yang

Citations: 0

h-index: 0

Ganesh Nanduru

Citations: 25

h-index: 1

Stephen Ebert

Citations: 0

h-index: 0

Praneeth Medepalli

Citations: 7

h-index: 2

Skyler Szot

Citations: 2

h-index: 1

Srivatsan Rajagopal

Citations: 3

h-index: 1

Alex Ong

Citations: 9

h-index: 2

Bhavana Mehta

Citations: 3

h-index: 1

본 보고서에서는 ZAYA1-8B를 소개합니다. ZAYA1-8B는 추론에 특화된 Mixture-of-Experts (MoE) 모델로, 활성 파라미터 7억 개와 총 파라미터 80억 개를 가지며, Zyphra의 MoE++ 아키텍처를 기반으로 구축되었습니다. ZAYA1-8B의 핵심 사전 훈련, 중간 훈련 및 지도 학습 (SFT)은 AMD의 통합 컴퓨팅, 네트워킹 및 소프트웨어 플랫폼을 사용하여 수행되었습니다. 10억 개 미만의 활성 파라미터로 ZAYA1-8B는 여러 어려운 수학 및 코딩 벤치마크에서 DeepSeek-R1-0528의 성능에 비등하거나 능가하며, 훨씬 더 큰 규모의 공개 모델과 경쟁력 있는 성능을 유지합니다. ZAYA1-8B는 추론 능력을 향상시키기 위해 처음부터 훈련되었으며, 사전 훈련 단계부터 추론 데이터를 포함하여 답변 보존 방식을 사용했습니다. 추가 훈련 단계는 다음과 같습니다: 수학 및 퍼즐 추론 워밍업; 400개의 RLVE-Gym 커리큘럼; 경쟁 프로그래밍 자료를 기반으로 구축된 합성 코드 환경과 테스트 시간 컴퓨팅 추적을 사용하는 수학 및 코드 강화 학습; 그리고 챗 및 지시 따르기 기능을 위한 행동 강화 학습. 또한, 테스트 시간 컴퓨팅 방법인 Markovian RSA를 소개합니다. Markovian RSA는 병렬 추론 데이터를 재귀적으로 집계하면서 각 단계에서 제한된 길이의 추론 결과만 전달합니다. 테스트 시간 평가에서 Markovian RSA는 ZAYA1-8B의 AIME'25 정확도를 91.9%로, HMMT'25 정확도를 89.6%로 향상시키며, 4K 토큰 이내의 추론 결과만 사용하여 Gemini-2.5 Pro, DeepSeek-V3.2, GPT-5-High와 같은 훨씬 더 큰 추론 모델과의 성능 격차를 줄였습니다.

Original Abstract

We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming references; and behavioral RL for chat and instruction following. We also introduce Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only bounded-length reasoning tails between rounds. In TTC evaluation, Markovian RSA raises ZAYA1-8B to 91.9\% on AIME'25 and 89.6\% on HMMT'25 while carrying forward only a 4K-token tail, narrowing the gap to much larger reasoning models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High.

0 Citations

0 Influential

13 Altmetric

65.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!