Machine learningTime-series forecasting

Time-MoE: Mixture-of-Experts Time-Series Foundation Model

Time-MoE is a billion-scale autoregressive foundation model for universal time-series forecasting, introduced by Shi et al. in 2024 and accepted at ICLR 2025. It combines a decoder-only transformer architecture with sparse Mixture-of-Experts (MoE) feed-forward layers, enabling the model to scale to billions of parameters while activating only a small subset of expert networks per token—dramatically increasing capacity without proportional compute cost.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Shi, X., Wang, S., Nie, Y., Li, D., Ye, Z., Wen, Q., & Jin, M. (2024). Time-MoE: Billion-scale time series foundation models with mixture of experts. ICLR 2025. link

Related methods

ScholarGateTime-MoE (Time-MoE (Mixture-of-Experts Time-Series Foundation Model)). Retrieved 2026-06-04 from https://scholargate.app/en/deep-learning/time-moe