2605.08044v1 May 08, 2026 cs.CL

빠른 바이트 잠재 트랜스포머

Fast Byte Latent Transformer

Xiaochuang Han

Citations: 20

h-index: 2

Christopher Potts

Citations: 102

h-index: 3

Srinivasan Iyer

Citations: 319

h-index: 4

Julie Kallini

Citations: 73

h-index: 3

Artidoro Pagnoni

Citations: 5,612

h-index: 11

Tomasz Limisiewicz

Citations: 9

h-index: 2

Gargi Ghosh

Citations: 168

h-index: 4

Luke S. Zettlemoyer

Citations: 154

h-index: 5

최근의 바이트 수준 언어 모델(LM)은 서브워드 어휘에 의존하지 않고 토큰 수준 모델과 동등한 성능을 보이지만, 바이트 단위의 순차적 생성 방식 때문에 활용에 제한이 있습니다. 본 연구에서는 바이트 잠재 트랜스포머(BLT)의 이러한 병목 현상을 새로운 훈련 및 생성 기법을 통해 해결합니다. 먼저, 표준 다음 바이트 예측 손실 함수와 함께 보조적인 블록 단위 확산(diffusion) 목적 함수를 사용하여 훈련된 새로운 모델인 BLT Diffusion (BLT-D)을 소개합니다. BLT-D는 디코딩 단계마다 여러 바이트를 병렬로 생성하여, 시퀀스 생성을 위해 필요한 순전파 횟수를 크게 줄입니다. 둘째, 추론 과정에서 속도를 일부 희생하는 대신 생성 품질을 높이는 두 가지 확장을 제안합니다. 첫 번째는 BLT Self-speculation (BLT-S)으로, BLT의 로컬 디코더가 일반적인 패치 경계를 넘어 바이트를 미리 생성하고, 이를 단일 전체 모델 순전파를 통해 검증합니다. 두 번째는 BLT Diffusion+Verification (BLT-DV)으로, BLT-D에 확산 기반 생성 후 자동 회귀 검증 단계를 추가합니다. 제안하는 모든 방법은 생성 작업에서 BLT보다 메모리 대역폭 비용을 50% 이상 절감할 수 있습니다. 각 접근 방식은 고유한 장점을 제공하며, 바이트 수준 LM의 실용적인 활용을 가로막는 주요 장애물을 제거합니다.

Original Abstract

Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generation techniques. First, we introduce BLT Diffusion (BLT-D), a new model and our fastest BLT variant, trained with an auxiliary block-wise diffusion objective alongside the standard next-byte prediction loss. This enables an inference procedure that generates multiple bytes in parallel per decoding step, substantially reducing the number of forward passes required to generate a sequence. Second, we propose two extensions inspired by speculative decoding that trade some of this speed for higher generation quality: BLT Self-speculation (BLT-S), in which BLT's local decoder continues generating past its normal patch boundaries to draft bytes, which are then verified with a single full-model forward pass; and BLT Diffusion+Verification (BLT-DV), which augments BLT-D with an autoregressive verification step after diffusion-based generation. All methods may achieve an estimated memory-bandwidth cost over 50% lower than BLT on generation tasks. Each approach offers its own unique advantages, together removing key barriers to the practical use of byte-level LMs.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!