Machine learning

Mixture of Experts

Mixture of Experts (MoE) is a sparse neural-network architecture, introduced by Shazeer and colleagues in 2017 with the sparsely-gated MoE layer, in which only a subset of expert sub-networks is activated for each input. As seen in models such as Switch Transformer and Mixtral, it holds computation cost fixed even as the total parameter count grows.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Shazeer, N. et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. ICLR. arXiv:1701.06538 link
  2. Jiang, A.Q. et al. (2024). Mixtral of Experts. arXiv. link

Related methods

Referenced by

ScholarGateMixture of Experts (Sparsely-Gated Mixture of Experts (MoE)). Retrieved 2026-06-04 from https://scholargate.app/en/deep-learning/mixture-of-experts