2601.21641v1 Jan 29, 2026 cs.LG

Seg-MoE: 다중 해상도, 구간 기반 믹스처 오브 Экспер츠 모델을 이용한 시계열 예측 트랜스포머

Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers

Citations: 91

h-index: 4

Citations: 112

h-index: 6

최근 트랜스포머 기반 모델은 정확한 시계열 예측에서 상당한 발전을 이루었지만, 이러한 아키텍처조차도 장기적인 시간적 동역학을 포착하면서 효율적으로 확장하는 데 어려움을 겪습니다. 믹스처 오브 Экспер츠(MoE) 레이어는 자연어 처리 분야에서 확장 문제를 해결하는 데 효과적인 솔루션으로 입증되었습니다. 그러나 시계열 예측을 위한 기존 MoE 접근 방식은 토큰 단위의 라우팅 메커니즘에 의존하는데, 이는 시계열 데이터의 자연적인 지역성 및 연속성을 활용하지 못할 수 있습니다. 본 연구에서는 Seg-MoE를 제안합니다. Seg-MoE는 독립적인 전문가 결정을 내리는 대신 연속적인 시간 단계를 묶어 처리하는 희소 MoE 설계입니다. 토큰 단위를 통해 각 전문가는 해당 구간 내의 상호 작용을 직접 모델링할 수 있으며, 이는 시계열 데이터의 내재된 패턴과 자연스럽게 일치합니다. Seg-MoE 레이어를 시계열 트랜스포머에 통합하고 여러 다변량 장기 예측 벤치마크에서 성능을 평가했습니다. Seg-MoE는 거의 모든 예측 기간에 걸쳐 최첨단 예측 정확도를 달성했으며, 기존의 밀집 트랜스포머 및 토큰 단위 MoE 모델보다 우수한 성능을 보였습니다. 포괄적인 분석 연구를 통해 구간 수준 라우팅이 이러한 성능 향상의 핵심 요인임을 확인했습니다. 본 연구 결과는 MoE 라우팅의 세분성을 시계열 데이터의 내재된 구조와 일치시키는 것이 강력하면서도 이전에는 충분히 탐구되지 않았던 귀납적 편향을 제공하며, 순차 데이터 모델링에서 조건부 희소 아키텍처에 대한 새로운 가능성을 열어준다는 것을 보여줍니다.

Original Abstract

Transformer-based models have recently made significant advances in accurate time-series forecasting, but even these architectures struggle to scale efficiently while capturing long-term temporal dynamics. Mixture-of-Experts (MoE) layers are a proven solution to scaling problems in natural language processing. However, existing MoE approaches for time-series forecasting rely on token-wise routing mechanisms, which may fail to exploit the natural locality and continuity of temporal data. In this work, we introduce Seg-MoE, a sparse MoE design that routes and processes contiguous time-step segments rather than making independent expert decisions. Token segments allow each expert to model intra-segment interactions directly, naturally aligning with inherent temporal patterns. We integrate Seg-MoE layers into a time-series Transformer and evaluate it on multiple multivariate long-term forecasting benchmarks. Seg-MoE consistently achieves state-of-the-art forecasting accuracy across almost all prediction horizons, outperforming both dense Transformers and prior token-wise MoE models. Comprehensive ablation studies confirm that segment-level routing is the key factor driving these gains. Our results show that aligning the MoE routing granularity with the inherent structure of time series provides a powerful, yet previously underexplored, inductive bias, opening new avenues for conditionally sparse architectures in sequential data modeling.

2 Citations

0 Influential

3 Altmetric

17.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!