Machine learningDeep learning / NLP / CV

Multimodal Variational Autoencoder

The Multimodal Variational Autoencoder (MVAE) is a deep generative model that learns a shared latent representation across two or more data modalities — such as images and captions — using a product-of-experts fusion of modality-specific encoders, enabling generation and inference even when only a subset of modalities is observed at test time.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Wu, M., & Goodman, N. (2018). Multimodal Generative Models for Scalable Weakly-Supervised Learning. Advances in Neural Information Processing Systems (NeurIPS), 31. link
  2. Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. International Conference on Learning Representations (ICLR). link

Related methods

Referenced by

ScholarGateMultimodal Variational Autoencoder (Multimodal Variational Autoencoder (MVAE)). Retrieved 2026-06-04 from https://scholargate.app/en/deep-learning/multimodal-variational-autoencoder