GateTS: Versatile and Efficient Forecasting via Attention-Inspired routed Mixture-of-Experts
By: Kyrylo Yemets, Mykola Lukashchuk, Ivan Izonin
Potential Business Impact:
Makes predictions better and faster.
Accurate univariate forecasting remains a pressing need in real-world systems, such as energy markets, hydrology, retail demand, and IoT monitoring, where signals are often intermittent and horizons span both short- and long-term. While transformers and Mixture-of-Experts (MoE) architectures are increasingly favored for time-series forecasting, a key gap persists: MoE models typically require complicated training with both the main forecasting loss and auxiliary load-balancing losses, along with careful routing/temperature tuning, which hinders practical adoption. In this paper, we propose a model architecture that simplifies the training process for univariate time series forecasting and effectively addresses both long- and short-term horizons, including intermittent patterns. Our approach combines sparse MoE computation with a novel attention-inspired gating mechanism that replaces the traditional one-layer softmax router. Through extensive empirical evaluation, we demonstrate that our gating design naturally promotes balanced expert utilization and achieves superior predictive accuracy without requiring the auxiliary load-balancing losses typically used in classical MoE implementations. The model achieves better performance while utilizing only a fraction of the parameters required by state-of-the-art transformer models, such as PatchTST. Furthermore, experiments across diverse datasets confirm that our MoE architecture with the proposed gating mechanism is more computationally efficient than LSTM for both long- and short-term forecasting, enabling cost-effective inference. These results highlight the potential of our approach for practical time-series forecasting applications where both accuracy and computational efficiency are critical.
Similar Papers
Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems
Machine Learning (CS)
Makes AI learn tasks much faster and better.
A Mixture of Experts Gating Network for Enhanced Surrogate Modeling in External Aerodynamics
Machine Learning (CS)
Makes car designs faster by predicting air flow.
Wavelet Mixture of Experts for Time Series Forecasting
Machine Learning (CS)
Predicts future events more accurately with less data.