Bayesian Nonparametrics
Bayesian nonparametrics places priors on infinite-dimensional objects such as distributions and functions, letting model complexity grow with the data instead of being fixed in advance.
Definition
Bayesian nonparametrics is the branch of Bayesian statistics that uses prior distributions over infinite-dimensional parameter spaces, so that the effective number of parameters can adapt to the data rather than being set by the analyst.
Scope
This area covers priors over probability measures and functions: the Dirichlet process and its use in mixture models for density estimation and clustering, Gaussian processes for flexible regression, and the stick-breaking and random-measure constructions that build these priors, together with posterior consistency results.
Sub-topics
Core questions
- How can a prior be defined over an infinite-dimensional space such as the set of distributions?
- How does the Dirichlet process support density estimation and clustering with an unknown number of components?
- How do Gaussian processes place priors over functions for flexible regression?
- When does the posterior concentrate on the truth as data accumulate?
Key concepts
- Dirichlet process
- Gaussian process
- stick-breaking construction
- random measure
- infinite mixture model
- posterior consistency
- nonparametric prior
Key theories
- Dirichlet process prior
- Ferguson's Dirichlet process is a distribution over probability measures that is conjugate for sampling, providing the foundational nonparametric prior for unknown distributions.
- Posterior consistency and rates
- Nonparametric Bayesian procedures can be shown, under conditions, to concentrate around the true distribution or function at near-optimal rates, providing frequentist justification for the priors.
Clinical relevance
Nonparametric Bayesian models support flexible density estimation, clustering with an unknown number of groups, and nonlinear regression in genomics, machine learning, and spatial statistics, where rigid parametric forms would be too restrictive.
History
Ferguson introduced the Dirichlet process in 1973 and Sethuraman's 1994 stick-breaking representation made it computationally tractable. Gaussian-process methods and a rich theory of posterior consistency and contraction rates, synthesized by Ghosal and van der Vaart in 2017, established Bayesian nonparametrics as a mature field.
Debates
- Prior influence in infinite dimensions
- In nonparametric models the prior never fully washes out, so its concentration and smoothness assumptions can strongly affect inference, raising questions about robustness and calibration.
Key figures
- Thomas Ferguson
- David Blackwell
- Jayaram Sethuraman
- Michael Jordan
- Aad van der Vaart
Related topics
Seminal works
- ferguson1973
- ghosal2017
Frequently asked questions
- Does 'nonparametric' mean there are no parameters?
- No. It means the model has infinitely many parameters, or equivalently a parameter that is a whole function or distribution, so that its complexity can grow with the data rather than being fixed in advance.