2602.13940v1 Feb 15, 2026 cs.LG

강화 학습을 통한 엔드투엔드 토큰화 학습

You Can Learn Tokenization End-to-End with Reinforcement Learning

Citations: 3,080

h-index: 24

Citations: 3,848

h-index: 2

토큰화는 대규모 언어 모델(LLM)의 학습 파이프라인에 여전히 포함된 하드코딩된 압축 단계입니다. 최근에는 전체적으로 엔드투엔드 아키텍처로 전환하는 추세가 있지만, 기존 연구에서는 휴리스틱을 사용하여 토큰 경계를 정의하거나, 이산적인 토큰 경계 문제를 연속적인 문제로 취급하여 직접 추정하는 방법을 통해 LLM 아키텍처 내에서 이 압축 단계를 구현하는 데 유망한 결과를 보여주었습니다. 본 연구에서는 손실을 최소화하기 위해 이산적인 토큰 경계를 직접 최적화하는 점수 함수 추정 방식을 사용하여 토큰 경계를 학습할 수 있음을 보여줍니다. 또한, 강화 학습 기법인 시간 할인과 같은 기술이 이 점수 함수의 분산을 충분히 줄여 실용적으로 사용할 수 있도록 하는 데 필요함을 확인했습니다. 결과적으로, 제안하는 방법은 기존의 직접 추정 방식보다 더 우수한 성능을 보이며, 이는 1억 개의 파라미터를 사용하는 규모에서도 정성적 및 정량적으로 입증되었습니다.

Original Abstract

Tokenization is a hardcoded compression step which remains in the training pipeline of Large Language Models (LLMs), despite a general trend towards architectures becoming increasingly end-to-end. Prior work has shown promising results at scale in bringing this compression step inside the LLMs' architecture with heuristics to draw token boundaries, and also attempts to learn these token boundaries with straight-through estimates, which treat the problem of drawing discrete token boundaries as a continuous one. We show that these token boundaries can instead be learned using score function estimates, which have tighter theoretical guarantees due to directly optimizing the problem of drawing discrete token boundaries to minimize loss. We observe that techniques from reinforcement learning, such as time discounting, are necessary to reduce the variance of this score function sufficiently to make it practicable. We demonstrate that the resultant method outperforms prior proposed straight-through estimates, both qualitatively and quantitatively at the $100$ million parameter scale.

0 Citations

0 Influential

12 Altmetric

60.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!