Markov Chain Monte Carlo
Markov chain Monte Carlo samples from a complex target distribution by simulating a Markov chain engineered to have that distribution as its unique stationary law.
Definition
Markov chain Monte Carlo is a family of algorithms that estimate expectations under a target probability distribution by running an ergodic Markov chain whose stationary distribution is the target and averaging a function over the chain's path.
Scope
This topic covers the design of transition kernels with a prescribed stationary distribution, the Metropolis-Hastings algorithm and its acceptance rule, the Gibbs sampler for conditional updates, convergence diagnostics and burn-in, the effect of autocorrelation on estimator variance, and the link between mixing times and the computational cost of sampling.
Core questions
- How is a Markov chain constructed to have a desired stationary distribution?
- Why does the Metropolis-Hastings acceptance rule produce the correct stationary law?
- How does the Gibbs sampler exploit conditional distributions?
- How long must a chain run before its samples are usable, and how is this assessed?
Key theories
- Metropolis-Hastings construction
- Proposing moves from an arbitrary kernel and accepting them with a probability built from the target density ratio yields a reversible chain whose stationary distribution is exactly the target, requiring only the target up to a normalising constant.
- Ergodic averages and Monte Carlo estimation
- Because the chain is ergodic with the target as its stationary law, time averages of a function along the chain converge almost surely to the target expectation, justifying the use of simulated paths as samples.
Clinical relevance
Markov chain Monte Carlo is the workhorse of modern Bayesian statistics, statistical physics, and machine learning, enabling inference over high-dimensional posterior distributions, partition functions, and energy landscapes that cannot be integrated analytically; its reliability depends on the underlying chain mixing rapidly enough.
History
The acceptance-rejection chain originated in the 1953 Metropolis algorithm for statistical physics, was generalised by Hastings in 1970, and was recast for statistics through the Gibbs sampler of Geman and Geman in 1984 and the influential Bayesian applications of Gelfand and Smith around 1990, which launched the computational Bayesian revolution.
Key figures
- Nicholas Metropolis
- W. Keith Hastings
- Stuart Geman
- Donald Geman
Related topics
Seminal works
- robertCasella2004
- hastings1970
Frequently asked questions
- Why use a Markov chain to draw samples?
- For high-dimensional or unnormalised target distributions, direct sampling is infeasible; a Markov chain that converges to the target lets you generate dependent but correctly distributed samples after it reaches equilibrium.
- What is burn-in?
- It is the initial portion of the chain discarded because the chain has not yet converged to its stationary distribution, so those early states would bias the estimates.