What is a latent variable?

A latent variable is an unobserved quantity included in a model to explain the observed data, such as which hidden cluster generated a data point. The model infers a distribution over these hidden variables rather than measuring them directly.

Why can the EM algorithm get stuck?

EM increases the likelihood at every step but only guarantees convergence to a local maximum or stationary point. Different initializations can lead to different solutions, so practitioners often run it several times from different starting values.

Latent Variable and Mixture Models

Latent-variable and mixture models explain observed data through hidden variables, fitting them by alternately inferring the hidden structure and updating parameters.

Definition

A latent-variable model represents each observation as generated with the help of unobserved variables, such as which mixture component produced a point; the expectation-maximization algorithm estimates parameters by iterating between computing expected values of the latent variables and maximizing the resulting expected likelihood.

Scope

This topic covers probabilistic models with unobserved variables: finite mixture models such as the Gaussian mixture, hidden Markov models for sequences, and the expectation-maximization algorithm that fits them by maximizing likelihood. It also covers the connection to soft clustering, density estimation, and the variational view of EM as bounding the data likelihood.

Core questions

How do hidden variables explain observed data?
How does the expectation-maximization algorithm increase likelihood at each step?
How do Gaussian mixtures perform soft clustering and density estimation?
Why might EM converge only to a local optimum?

Key theories

The expectation-maximization algorithm: EM alternates an expectation step that infers the distribution over latent variables with a maximization step that updates parameters, provably never decreasing the data likelihood and converging to a stationary point.
Gaussian mixture models: Modeling data as a weighted sum of Gaussian components yields flexible density estimates and soft cluster assignments, with each point given a probability of belonging to each component.
EM as lower-bound maximization: EM can be viewed as maximizing a variational lower bound on the log-likelihood, a perspective that generalizes to approximate inference in more complex latent-variable models.

Clinical relevance

Latent-variable models underpin soft clustering, density estimation, missing-data imputation, and sequence modeling with hidden Markov models in speech and bioinformatics; the expectation-maximization algorithm they rely on is one of the most widely used optimization procedures in statistics and machine learning.

History

Special cases of the expectation-maximization idea appeared in genetics and incomplete-data problems before Dempster, Laird, and Rubin gave the general formulation in 1977. Gaussian mixtures and hidden Markov models became standard latent-variable tools, and the variational reinterpretation of EM later connected it to modern approximate-inference methods.

Key figures

Arthur Dempster
Nan Laird
Donald Rubin

Seminal works

dempster1977
bishop2006
murphy2012

Frequently asked questions

What is a latent variable?: A latent variable is an unobserved quantity included in a model to explain the observed data, such as which hidden cluster generated a data point. The model infers a distribution over these hidden variables rather than measuring them directly.
Why can the EM algorithm get stuck?: EM increases the likelihood at every step but only guarantees convergence to a local maximum or stationary point. Different initializations can lead to different solutions, so practitioners often run it several times from different starting values.