2603.26097v1 Mar 27, 2026 cs.LG

강화 학습 기반 패치링을 통한 동적 토큰화: 엔드-투-엔드 학습 및 제로샷 전이

Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer

Hyeji Kim

Citations: 102

h-index: 5

Yulun Wu

Citations: 4

h-index: 1

S. Ankireddy

Citations: 112

h-index: 7

Sam Sharpe

Citations: 5

h-index: 2

N. Seleznev

Citations: 6

h-index: 2

Dehao Yuan

Citations: 4

h-index: 1

Nam H. Nguyen

Citations: 220

h-index: 5

최근 딥러닝 모델에서 공간적 또는 시간적 범위를 효율적으로 통합하여 간결한 표현을 얻는 것은 중요한 원칙이 되었지만, 특히 시계열 데이터와 같이 연속적인 시퀀스 데이터에 대한 데이터 적응형 표현을 학습하는 것은 여전히 해결해야 할 과제입니다. 고정 크기의 패치링은 확장성과 성능을 향상시켰지만, 데이터 기반으로 가변 크기의 패치를 엔드-투-엔드로 학습하는 것은 모델이 종종 소프트한 이산화, 특정 백본 구조 또는 휴리스틱 규칙에 의존하게 만듭니다. 본 논문에서는 강화 학습을 사용하여 시퀀스 패치링 정책과 다운스트림 시퀀스 백본 모델을 동시에 최적화하는 최초의 프레임워크인 Reinforcement Patching (ReinPatch)를 제안합니다. 본 연구에서는 패치 경계 위치를 그룹 상대 정책 그래디언트 (GRPG)를 통해 최적화되는 이산적 의사 결정 프로세스로 정의함으로써, 연속적인 이완 없이 동적 패치링 정책 최적화를 자연스럽게 수행합니다. 또한, 본 방법은 원하는 압축률을 엄격하게 적용할 수 있어 다운스트림 백본 모델이 효율적으로 확장될 수 있도록 지원하며, 다단계 계층적 모델링을 자연스럽게 지원합니다. 본 연구에서는 시계열 예측 데이터셋에서 ReinPatch를 평가한 결과, 최첨단 데이터 기반 패치링 전략과 비교하여 뛰어난 성능을 보여주었습니다. 또한, 본 연구의 독립적인 설계 덕분에 패치링 모듈을 독립적인 기반 패처로 추출하여 사용할 수 있으며, 이를 통해 순수한 성능 중심의 신경망 패치링 전략이 선호하는 분할 방식에 대한 시각적 및 경험적 통찰력을 연구 커뮤니티에 제공할 수 있습니다.

Original Abstract

Efficiently aggregating spatial or temporal horizons to acquire compact representations has become a unifying principle in modern deep learning models, yet learning data-adaptive representations for long-horizon sequence data, especially continuous sequences like time series, remains an open challenge. While fixed-size patching has improved scalability and performance, discovering variable-sized, data-driven patches end-to-end often forces models to rely on soft discretization, specific backbones, or heuristic rules. In this work, we propose Reinforcement Patching (ReinPatch), the first framework to jointly optimize a sequence patching policy and its downstream sequence backbone model using reinforcement learning. By formulating patch boundary placement as a discrete decision process optimized via Group Relative Policy Gradient (GRPG), ReinPatch bypasses the need for continuous relaxations and performs dynamic patching policy optimization in a natural manner. Moreover, our method allows strict enforcement of a desired compression rate, freeing the downstream backbone to scale efficiently, and naturally supports multi-level hierarchical modeling. We evaluate ReinPatch on time-series forecasting datasets, where it demonstrates compelling performance compared to state-of-the-art data-driven patching strategies. Furthermore, our detached design allows the patching module to be extracted as a standalone foundation patcher, providing the community with visual and empirical insights into the segmentation behaviors preferred by a purely performance-driven neural patching strategy.

2 Citations

0 Influential

3.5 Altmetric

19.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!