Machine learningMachine learning

Bayesian Stacking Ensemble

Bayesian Stacking Ensemble (Bayesian Stacking of Predictive Distributions) · Also known as: Bayesian stacking, Bayesian model stacking, stacking with Bayesian weights, predictive distribution stacking

Bayesian stacking combines the predictive distributions of several base models by finding non-negative weights that maximise the leave-one-out log predictive score of the mixture. Formalised by Yao, Vehtari, Simpson, and Gelman (2018), it yields a single calibrated predictive distribution that is provably at least as good as any single constituent model under cross-validation.

Tools & resources

Download slides

Learn & explore

Read the full method

Members only

Method map

The neighbourhood of related methods — select a node to explore.

Bayesian Stacking Ensemble

Bagging Bayesian Model Averaging Boosting Gaussian Process Stacking Voting Ensemble

When to use it

Use Bayesian stacking when you have two or more competing probabilistic models and want a principled, calibrated combination that is robust to misspecification. It is especially valuable when no single model is obviously correct, when calibrated uncertainty is required (e.g., clinical decisions, risk forecasting), or when Bayesian models with different priors or likelihoods are being compared. Do not use it when models are nearly identical (stacking gains nothing) or when computational cost of LOO is prohibitive for very large datasets. It is not a substitute for model criticism: poorly specified base models produce a poor mixture.

Strengths & limitations

Strengths

Produces a fully calibrated predictive distribution rather than a point estimate, preserving uncertainty.
Optimises directly for predictive accuracy (LOO log score), making it theoretically sound and not dependent on marginal likelihood approximations.
More robust to model misspecification than Bayesian model averaging.
Convex weight optimisation guarantees a global optimum with no local minima.
Compatible with any collection of probabilistic base models regardless of their internal structure.
Naturally handles model redundancy: if two models are nearly identical, one receives near-zero weight.

Limitations

Requires that each base model can produce a full predictive distribution, not just point predictions.
LOO computation can be expensive for large datasets unless approximated with importance sampling.
With very few observations the LOO estimates of predictive accuracy are noisy, making weight estimates unreliable.
Does not improve poorly specified base models; garbage in, garbage out applies at the ensemble level.
Interpretation of the combined model is harder than interpreting any single base model.

Frequently asked

How is Bayesian stacking different from Bayesian model averaging (BMA)?

BMA weights models by their posterior model probabilities (marginal likelihoods), which concentrates weight on a single model when sample size grows. Bayesian stacking instead optimises LOO predictive accuracy and distributes weight more evenly across complementary models, making it more robust when all models are misspecified.

How many base models do I need?

Even two or three diverse models can benefit from stacking. The key is diversity: models that make different errors on different regions of the data. Adding nearly identical models wastes computation without improving the mixture.

Is LOO always required, or can I use k-fold cross-validation?

LOO is theoretically ideal and is often approximated efficiently via PSIS-LOO. For large datasets, k-fold cross-validation (e.g., 10-fold) can be used to estimate the log predictive scores needed for weight optimisation, at the cost of some approximation.

Can I use Bayesian stacking with non-Bayesian base models?

Yes, if the non-Bayesian models can produce probabilistic predictions (e.g., calibrated probabilities from a random forest or a Gaussian process). The stacking optimisation only requires predictive densities, not a full posterior.

What software supports Bayesian stacking?

The loo R package (Vehtari et al.) provides loo_model_weights() implementing Bayesian stacking. In Python, ArviZ supports LOO and weight computation; Stan and PyMC models integrate naturally as base models.

Sources

Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average Bayesian predictive distributions. Bayesian Analysis, 13(3), 917–1007. DOI: 10.1214/17-BA1091 ↗
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. DOI: 10.1016/S0893-6080(05)80023-1 ↗

How to cite this page

ScholarGate. (2026, June 3). Bayesian Stacking Ensemble (Bayesian Stacking of Predictive Distributions). ScholarGate. https://scholargate.app/en/machine-learning/bayesian-stacking-ensemble

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

BaggingMachine learning↔ compare
Bayesian Model AveragingBayesian↔ compare
BoostingMachine learning↔ compare
Gaussian ProcessMachine learning↔ compare
StackingMachine learning↔ compare
Voting EnsembleMachine learning↔ compare

Compare side by side →

Related reference concepts

Bayesian Model Averaging Bayesian Model Comparison and Selection Predictive Information Criteria Hierarchical Bayesian Models Bayes Factors and Marginal Likelihood Multilevel and Partial Pooling Models

Spotted an issue on this page? Report or suggest a fix →

Machine learningMachine learning

Bayesian Stacking Ensemble

Tools & resources

Download slides

Learn & explore

Read the full method

Members only

Method map

The neighbourhood of related methods — select a node to explore.

Bayesian Stacking Ensemble

Bagging Bayesian Model Averaging Boosting Gaussian Process Stacking Voting Ensemble

When to use it

Strengths & limitations

Strengths

Produces a fully calibrated predictive distribution rather than a point estimate, preserving uncertainty.
Optimises directly for predictive accuracy (LOO log score), making it theoretically sound and not dependent on marginal likelihood approximations.
More robust to model misspecification than Bayesian model averaging.
Convex weight optimisation guarantees a global optimum with no local minima.
Compatible with any collection of probabilistic base models regardless of their internal structure.
Naturally handles model redundancy: if two models are nearly identical, one receives near-zero weight.

Limitations

Requires that each base model can produce a full predictive distribution, not just point predictions.
LOO computation can be expensive for large datasets unless approximated with importance sampling.
With very few observations the LOO estimates of predictive accuracy are noisy, making weight estimates unreliable.
Does not improve poorly specified base models; garbage in, garbage out applies at the ensemble level.
Interpretation of the combined model is harder than interpreting any single base model.

Frequently asked

How is Bayesian stacking different from Bayesian model averaging (BMA)?

How many base models do I need?

Is LOO always required, or can I use k-fold cross-validation?

Can I use Bayesian stacking with non-Bayesian base models?

What software supports Bayesian stacking?

Sources

Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average Bayesian predictive distributions. Bayesian Analysis, 13(3), 917–1007. DOI: 10.1214/17-BA1091 ↗
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. DOI: 10.1016/S0893-6080(05)80023-1 ↗

How to cite this page

ScholarGate. (2026, June 3). Bayesian Stacking Ensemble (Bayesian Stacking of Predictive Distributions). ScholarGate. https://scholargate.app/en/machine-learning/bayesian-stacking-ensemble

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

BaggingMachine learning↔ compare
Bayesian Model AveragingBayesian↔ compare
BoostingMachine learning↔ compare
Gaussian ProcessMachine learning↔ compare
StackingMachine learning↔ compare
Voting EnsembleMachine learning↔ compare

Compare side by side →

Similar methods

Related reference concepts

Spotted an issue on this page? Report or suggest a fix →

Bayesian Stacking Ensemble

Read the full method

Method map

When to use it

Strengths & limitations

Frequently asked

Sources

How to cite this page

Related methods

Which method?

Similar methods

Related reference concepts

Bayesian Stacking Ensemble

Read the full method

Method map

When to use it

Strengths & limitations

Frequently asked

Sources

How to cite this page

Related methods

Which method?

Similar methods

Related reference concepts