Why use a Markov chain to draw samples?

For high-dimensional or unnormalised target distributions, direct sampling is infeasible; a Markov chain that converges to the target lets you generate dependent but correctly distributed samples after it reaches equilibrium.

It is the initial portion of the chain discarded because the chain has not yet converged to its stationary distribution, so those early states would bias the estimates.

Markov Chain Monte Carlo

Markov chain Monte Carlo samples from a complex target distribution by simulating a Markov chain engineered to have that distribution as its unique stationary law.

Troba un tema amb PaperMindAviatFind papers & topics

Tools & resources

Baixa les diapositives

Learn & explore

VídeoAviat

Definition

Markov chain Monte Carlo is a family of algorithms that estimate expectations under a target probability distribution by running an ergodic Markov chain whose stationary distribution is the target and averaging a function over the chain's path.

Scope

This topic covers the design of transition kernels with a prescribed stationary distribution, the Metropolis-Hastings algorithm and its acceptance rule, the Gibbs sampler for conditional updates, convergence diagnostics and burn-in, the effect of autocorrelation on estimator variance, and the link between mixing times and the computational cost of sampling.

Core questions

How is a Markov chain constructed to have a desired stationary distribution?
Why does the Metropolis-Hastings acceptance rule produce the correct stationary law?
How does the Gibbs sampler exploit conditional distributions?
How long must a chain run before its samples are usable, and how is this assessed?

Key theories

Metropolis-Hastings construction: Proposing moves from an arbitrary kernel and accepting them with a probability built from the target density ratio yields a reversible chain whose stationary distribution is exactly the target, requiring only the target up to a normalising constant.
Ergodic averages and Monte Carlo estimation: Because the chain is ergodic with the target as its stationary law, time averages of a function along the chain converge almost surely to the target expectation, justifying the use of simulated paths as samples.

Clinical relevance

Markov chain Monte Carlo is the workhorse of modern Bayesian statistics, statistical physics, and machine learning, enabling inference over high-dimensional posterior distributions, partition functions, and energy landscapes that cannot be integrated analytically; its reliability depends on the underlying chain mixing rapidly enough.

History

The acceptance-rejection chain originated in the 1953 Metropolis algorithm for statistical physics, was generalised by Hastings in 1970, and was recast for statistics through the Gibbs sampler of Geman and Geman in 1984 and the influential Bayesian applications of Gelfand and Smith around 1990, which launched the computational Bayesian revolution.

Key figures

Nicholas Metropolis
W. Keith Hastings
Stuart Geman
Donald Geman

Seminal works

robertCasella2004
hastings1970

Frequently asked questions

Why use a Markov chain to draw samples?: For high-dimensional or unnormalised target distributions, direct sampling is infeasible; a Markov chain that converges to the target lets you generate dependent but correctly distributed samples after it reaches equilibrium.
What is burn-in?: It is the initial portion of the chain discarded because the chain has not yet converged to its stationary distribution, so those early states would bias the estimates.