Why not just pick the single best model?

Selecting one model ignores the uncertainty about which model is correct and can produce overconfident predictions; averaging over models, or stacking them, propagates that uncertainty and usually improves predictive calibration.

Bayesian Model Averaging

Bayesian model averaging accounts for uncertainty about which model is correct by combining the predictions of all candidate models, weighted by their posterior probabilities.

Găsește o temă cu PaperMindÎn curândFind papers & topics

Tools & resources

Descarcă prezentarea

Learn & explore

VideoÎn curând

Definition

Bayesian model averaging forms predictions and inferences by taking a weighted average over a set of candidate models, with weights equal to the posterior probability of each model given the data, thereby incorporating model uncertainty into the final answer.

Scope

This topic covers the formulation of model averaging over a model space, posterior model probabilities as weights, its benefit for calibrated prediction under model uncertainty, the practical challenges of large model spaces, and predictive alternatives such as stacking.

Core questions

How are predictions averaged across models using posterior model probabilities?
Why does model averaging improve predictive calibration under model uncertainty?
How are large or infinite model spaces handled in practice?
How does stacking differ from posterior-probability weighting?

Key concepts

posterior model probability
model space
model uncertainty
predictive averaging
stacking
Occam's window

Key theories

Averaging over the model space: Treating the model index as an unknown with its own posterior yields predictions that integrate over models, which under the assumption that the true model is in the set is optimal for prediction.
Predictive stacking: When no candidate is exactly correct, stacking chooses combination weights to maximize cross-validated predictive performance, often outperforming posterior-probability weighting in practice.

Clinical relevance

Model averaging produces more honest predictive uncertainty in fields such as climate projection, epidemiological forecasting, and economics, where committing to a single model would understate the true uncertainty.

History

Bayesian model averaging was developed through the 1990s and synthesized in the 1999 tutorial by Hoeting and colleagues. Recognition that the true model is rarely in the candidate set later motivated predictive stacking as a more robust combination method.

Debates

Model-probability weighting versus stacking: When all candidate models are wrong, posterior-probability weights can concentrate on a single poor model, so predictive stacking is increasingly preferred for combining models for prediction.

Key figures

Adrian Raftery
David Madigan
Jennifer Hoeting
Andrew Gelman

Seminal works

hoeting1999
yao2018

Frequently asked questions

Why not just pick the single best model?: Selecting one model ignores the uncertainty about which model is correct and can produce overconfident predictions; averaging over models, or stacking them, propagates that uncertainty and usually improves predictive calibration.