Variational Inference
Variational inference turns posterior approximation into optimization, fitting a simpler distribution to the posterior by maximizing a lower bound on the marginal likelihood.
Definition
Variational inference approximates an intractable posterior by selecting, from a tractable family of distributions, the member that minimizes the Kullback-Leibler divergence to the posterior, equivalently by maximizing the evidence lower bound on the log marginal likelihood.
Scope
This topic covers the variational objective (the evidence lower bound), the mean-field family and its factorization assumptions, coordinate-ascent and stochastic gradient algorithms, and the trade-offs between speed and the systematic biases of approximate inference.
Core questions
- How is posterior approximation framed as an optimization problem?
- What is the evidence lower bound and how is it related to the KL divergence?
- What does the mean-field assumption sacrifice in exchange for tractability?
- How do stochastic and black-box methods scale variational inference to large data?
Key concepts
- evidence lower bound
- Kullback-Leibler divergence
- mean-field family
- coordinate-ascent variational inference
- stochastic variational inference
- black-box variational inference
- variance underestimation
Key theories
- Evidence lower bound
- Maximizing the ELBO is equivalent to minimizing the KL divergence from the approximation to the posterior, recasting inference as a tractable optimization over a chosen family.
- Mean-field approximation
- Assuming the approximate posterior factorizes across parameter blocks yields closed-form coordinate-ascent updates but tends to underestimate posterior variance and ignore dependencies.
Clinical relevance
Variational inference scales Bayesian methods to large datasets and complex models in text analysis, genomics, and deep learning, where the cost of full MCMC would be prohibitive and a fast approximate posterior suffices.
History
Variational methods entered machine learning through mean-field approximations for graphical models in the late 1990s. Stochastic and automatic variational inference in the 2010s, surveyed by Blei and colleagues in 2017, brought scalable approximate Bayesian inference to mainstream statistics and probabilistic programming.
Debates
- Bias of approximate posteriors
- Variational inference is fast but its KL objective systematically understates uncertainty, so the reliability of its approximate posteriors relative to asymptotically exact MCMC is debated.
Key figures
- Michael Jordan
- Zoubin Ghahramani
- David Blei
- Tommi Jaakkola
Related topics
Seminal works
- blei2017
- jordan1999
Frequently asked questions
- When should I use variational inference instead of MCMC?
- Variational inference is attractive when datasets or models are too large for MCMC to be feasible and a fast, approximate posterior is acceptable; MCMC remains preferable when accurate uncertainty quantification is essential, because variational methods tend to underestimate posterior variance.