Hyperpriors and Shrinkage
Hyperpriors are the priors placed on the top-level parameters of a hierarchical model, and they control how strongly group estimates are shrunk toward the population mean.
Definition
A hyperprior is a prior distribution on the hyperparameters that govern the distribution of group-level parameters; together with the data it determines the posterior for the group-level variance and hence the degree of shrinkage applied to each group.
Scope
This topic covers the specification of priors for hierarchical means and especially variance components, the way the group-level variance governs shrinkage, the danger of degenerate posteriors from poor variance priors, and recommended weakly informative choices such as half-Cauchy and half-normal priors.
Core questions
- Why does the group-level variance control the amount of shrinkage?
- What goes wrong when an inappropriate prior is used for a variance component?
- Which weakly informative hyperpriors are recommended for scale parameters?
- How does shrinkage relate to the Stein and empirical Bayes results?
Key concepts
- hyperprior
- variance component
- half-Cauchy prior
- inverse-gamma prior
- shrinkage
- James-Stein estimator
- degenerate posterior
Key theories
- Variance-component priors
- The hyperprior on the group-level standard deviation strongly influences inference when groups are few; folded-noncentral and half-Cauchy priors avoid the pathologies of conventional inverse-gamma choices.
- Shrinkage as risk reduction
- Shrinking many related estimates toward a common center lowers total mean squared error, the same principle that makes the James-Stein estimator dominate the sample mean.
Clinical relevance
Sensible hyperpriors prevent overconfident or unstable estimates of between-group variation in meta-analysis and multi-site studies, where the number of groups is often small and the variance is hard to estimate.
History
Shrinkage estimation grew from Stein's 1956 result and the empirical Bayes work of Efron and Morris in the 1970s. Gelman's 2006 analysis of variance-parameter priors clarified how hyperprior choice shapes shrinkage in fully Bayesian hierarchical models.
Debates
- Which prior for the group-level variance?
- Conventional inverse-gamma priors can be unintentionally informative near zero, so there is ongoing discussion about half-Cauchy, half-normal, and other weakly informative scale priors.
Key figures
- Andrew Gelman
- Bradley Efron
- Carl Morris
- Charles Stein
Related topics
Seminal works
- gelman2006
- efron1975
Frequently asked questions
- Why not just use a flat prior on the group-level variance?
- A flat or default inverse-gamma prior can place excessive weight near zero or fail to be proper, producing collapsed or unstable posteriors when groups are few; weakly informative scale priors such as the half-Cauchy behave more reliably.