2601.21866v1 Jan 29, 2026 cs.LG

MoHETS: 혼합-이종-전문가 모델을 활용한 장기 시계열 예측

MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts

Evandro S. Ortigossa

Citations: 83

h-index: 4

Guy Lutsker

Citations: 72

h-index: 4

Eran Segal

Citations: 100

h-index: 6

실제 다변량 시계열 데이터는 전반적인 추세, 지역적인 주기성, 그리고 비정상적인 패턴 등 복잡한 다중 척도 구조를 나타내므로, 장기 예측은 매우 어려운 과제입니다. 희소 혼합-전문가(MoE) 방식은 확장성과 전문성을 향상시키지만, 일반적으로 시계열 데이터의 다양한 시간적 특성을 제대로 포착하지 못하는 균일한 다층 퍼셉트론(MLP) 전문가 모델에 의존하는 경향이 있습니다. 본 논문에서는 이러한 한계를 극복하기 위해, 희소 혼합-이종-전문가(MoHE) 레이어를 통합한 인코더 전용 트랜스포머 모델인 MoHETS를 제안합니다. MoHE는 시간적 패치를 소수의 전문가 네트워크로 분배하며, 시퀀스 수준의 연속성을 위한 공유 심층 컨볼루션 전문가와 패치 수준의 주기적 구조를 위한 분배된 푸리에 기반 전문가를 결합합니다. MoHETS는 또한 교차-어텐션을 통해 외생 정보를 통합하여 비정상적인 동적 변화에 대한 견고성을 향상시킵니다. 또한, 파라미터 수가 많은 선형 투영 헤드를 가벼운 컨볼루션 패치 디코더로 대체하여 파라미터 효율성을 높이고, 학습 불안정을 줄이며, 하나의 모델이 임의의 예측 기간에 걸쳐 일반화될 수 있도록 합니다. MoHETS는 7개의 다변량 벤치마크 및 다양한 예측 기간에 걸쳐 검증되었으며, MoHETS는 일관되게 최첨단 성능을 달성하여, 강력한 기존 모델 대비 평균 MSE를 12% 줄이는 결과를 보여주었습니다. 이는 장기 예측을 위한 효과적인 이종 전문화의 가능성을 입증합니다.

Original Abstract

Real-world multivariate time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes, which makes long-horizon forecasting challenging. Although sparse Mixture-of-Experts (MoE) approaches improve scalability and specialization, they typically rely on homogeneous MLP experts that poorly capture the diverse temporal dynamics of time series data. We address these limitations with MoHETS, an encoder-only Transformer that integrates sparse Mixture-of-Heterogeneous-Experts (MoHE) layers. MoHE routes temporal patches to a small subset of expert networks, combining a shared depthwise-convolution expert for sequence-level continuity with routed Fourier-based experts for patch-level periodic structures. MoHETS further improves robustness to non-stationary dynamics by incorporating exogenous information via cross-attention over covariate patch embeddings. Finally, we replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder, improving parameter efficiency, reducing training instability, and allowing a single model to generalize across arbitrary forecast horizons. We validate across seven multivariate benchmarks and multiple horizons, with MoHETS consistently achieving state-of-the-art performance, reducing the average MSE by $12\%$ compared to strong recent baselines, demonstrating effective heterogeneous specialization for long-term forecasting.

1 Citations

0 Influential

3 Altmetric

16.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!