Hierarchical Bayesian Models
Hierarchical Bayesian models share information across related units by giving their parameters a common prior, yielding partial pooling that improves estimates for every group.
Definition
A hierarchical Bayesian model places a prior on group-specific parameters that itself depends on higher-level parameters with their own (hyper)priors, so that information is borrowed across groups and uncertainty propagates through all levels of the hierarchy.
Scope
This area covers multilevel model structure and partial pooling, the role of hyperpriors on population-level parameters, the resulting shrinkage of group estimates toward the overall mean, and the empirical Bayes approximation that estimates the prior from the data.
Sub-topics
Core questions
- How does a hierarchical prior induce partial pooling across groups?
- What roles do hyperparameters and hyperpriors play in the model?
- Why and how are group-level estimates shrunk toward the population mean?
- How does empirical Bayes approximate a full hierarchical analysis?
Key concepts
- multilevel model
- partial pooling
- hyperparameter
- hyperprior
- shrinkage
- random effects
- empirical Bayes
- borrowing strength
Key theories
- Partial pooling
- By estimating group parameters jointly under a shared prior, hierarchical models interpolate between no pooling and complete pooling, with the degree of pooling determined by the data.
- Shrinkage and Stein's effect
- Shrinking group estimates toward the population mean reduces total estimation error, a phenomenon connected to the inadmissibility of the sample mean in multiple dimensions demonstrated by Stein's estimator.
Clinical relevance
Hierarchical models are the standard tool for meta-analysis, multi-center clinical trials, small-area estimation, and any setting with many related groups, because partial pooling stabilizes estimates where data are sparse.
History
Lindley and Smith formalized the Bayesian linear hierarchical model in 1972, building on the Stein and empirical Bayes work of the 1950s-1970s that revealed the benefits of shrinkage. Computational advances later made fully Bayesian hierarchical modeling routine across applied fields.
Debates
- Priors on variance components
- The choice of hyperprior for group-level variances strongly affects shrinkage when groups are few, and there is ongoing discussion of which weakly informative priors behave best.
Key figures
- Dennis Lindley
- Adrian Smith
- Bradley Efron
- Carl Morris
- Andrew Gelman
Related topics
Seminal works
- gelman2013
- efron1975
Frequently asked questions
- What is partial pooling?
- Partial pooling estimates each group's parameter using both its own data and information from the other groups through a shared prior, producing estimates between fully separate (no pooling) and fully combined (complete pooling) analyses.
- Why are hierarchical estimates 'shrunk'?
- Because the shared prior pulls each group's estimate toward the overall mean by an amount that depends on how noisy that group's data are; noisier groups are shrunk more, which reduces overall error.