Why use the expectation-maximization algorithm instead of direct maximization?

When data are incomplete or a model has latent variables, the observed-data likelihood can be awkward to maximize directly. The algorithm works with the simpler complete-data likelihood and converges reliably with a guaranteed non-decrease at each step.

Does the algorithm always find the global maximum?

No. It converges to a stationary point that depends on the starting values, which may be a local maximum or saddle point. Running it from several initializations is the usual safeguard.

EM Algorithm

The expectation-maximization algorithm finds maximum-likelihood estimates in models with missing data or latent variables by alternating between imputing the missing information and maximizing the resulting expected log-likelihood.

Definition

The expectation-maximization algorithm is an iterative procedure that maximizes a likelihood with incomplete data by repeatedly forming the expected complete-data log-likelihood given current parameters (the E-step) and maximizing it to update the parameters (the M-step).

Scope

This topic covers the E-step and M-step that define the algorithm, the monotone ascent property that guarantees the likelihood never decreases, its convergence behaviour and rate, canonical applications to mixture models and missing data, and extensions such as the generalized and Monte Carlo EM variants. Its relationship to other ascent methods is noted.

Core questions

How do the E-step and M-step jointly increase the observed-data likelihood?
Why does the algorithm guarantee a monotone increase in the likelihood?
How is the algorithm applied to mixture models and missing-data problems?
What governs its convergence rate, and how do accelerated variants address slow convergence?

Key concepts

Complete-data likelihood
E-step
M-step
Latent variables
Monotone convergence
Local maxima

Key theories

E-step and M-step alternation: The E-step computes the expectation of the complete-data log-likelihood under the current parameters, and the M-step maximizes this expectation; iterating drives the parameters toward a stationary point of the observed-data likelihood.
Monotone ascent property: Each iteration provably does not decrease the observed-data likelihood, which gives the algorithm its stability, though it can converge slowly and only to a local maximum depending on initialization.

Clinical relevance

The expectation-maximization algorithm is the standard engine for fitting finite mixture models, hidden Markov models, factor and latent-class models, and for handling missing data, making it foundational across statistics, machine learning, genetics and signal processing.

History

Dempster, Laird and Rubin unified many earlier special-case algorithms into the general expectation-maximization framework in 1977; subsequent work developed convergence theory, acceleration schemes, and Monte Carlo and generalized variants for intractable steps.

Key figures

Arthur Dempster
Nan Laird
Donald Rubin
Geoffrey McLachlan

Seminal works

dempster1977
mclachlan2008

Frequently asked questions

Why use the expectation-maximization algorithm instead of direct maximization?: When data are incomplete or a model has latent variables, the observed-data likelihood can be awkward to maximize directly. The algorithm works with the simpler complete-data likelihood and converges reliably with a guaranteed non-decrease at each step.
Does the algorithm always find the global maximum?: No. It converges to a stationary point that depends on the starting values, which may be a local maximum or saddle point. Running it from several initializations is the usual safeguard.