Bayes' Theorem and the Posterior
Bayes' theorem expresses the posterior distribution of unknown parameters as proportional to the likelihood of the data times the prior, providing the engine of all Bayesian inference.
Definition
Bayes' theorem states that the posterior density p(theta | y) equals the likelihood p(y | theta) times the prior p(theta) divided by the marginal likelihood p(y); since p(y) does not depend on theta, the posterior is often written as proportional to likelihood times prior.
Scope
This topic covers the statement and derivation of Bayes' theorem for inference, the proportionality form, the marginal likelihood that normalizes the posterior, and how summaries such as posterior means, credible intervals, and the posterior predictive distribution are obtained.
Core questions
- How is the posterior distribution derived from the prior and the likelihood?
- What is the marginal likelihood and why does it act as a normalizing constant?
- How are point estimates and credible intervals extracted from a posterior?
- What is the posterior predictive distribution and how is it computed?
Key concepts
- prior
- likelihood
- posterior
- marginal likelihood
- credible interval
- posterior predictive distribution
- normalizing constant
Key theories
- Posterior proportionality
- Because the marginal likelihood is constant in the parameter, inference depends only on the product of likelihood and prior up to normalization, which is the form exploited by most computational methods.
- Posterior predictive distribution
- Future or replicated data are predicted by averaging the sampling distribution over the posterior, integrating out parameter uncertainty rather than plugging in a point estimate.
Clinical relevance
Posterior inference is used wherever a quantity of interest must be estimated with calibrated uncertainty, including diagnostic test interpretation, parameter estimation in the physical sciences, and probabilistic forecasting.
History
The rule originates in Bayes' 1763 essay and was generalized by Laplace into the method of inverse probability. The modern emphasis on the full posterior distribution, rather than a single inverse-probability estimate, was consolidated in the 20th-century Bayesian literature.
Key figures
- Thomas Bayes
- Pierre-Simon Laplace
- Harold Jeffreys
Related topics
Seminal works
- gelman2013
- bayes1763
Frequently asked questions
- What is a credible interval?
- A credible interval is a range that contains the parameter with a stated posterior probability (for example 95%); unlike a frequentist confidence interval it is a direct probability statement about the parameter given the data and prior.
- Why can the posterior be written without computing the marginal likelihood?
- The marginal likelihood is a constant with respect to the parameter, so it only rescales the posterior; many algorithms such as MCMC need the posterior only up to this constant.