What is the difference between bagging and boosting?

Bagging trains base models independently on resampled data and averages them to reduce variance. Boosting trains models sequentially, with each new model focusing on the errors of the current ensemble, which reduces bias. Bagging parallelizes naturally; boosting is inherently sequential.

Why do random forests rarely overfit badly?

Each tree is grown on a different bootstrap sample using a random subset of features, so the trees are decorrelated. Averaging many decorrelated trees cancels much of their individual variance, and adding more trees does not increase overfitting.

Ensemble Methods

Ensemble methods combine many individual models into a single predictor, reducing variance or bias to achieve accuracy that surpasses any one member.

Знайти тему у PaperMindНезабаромFind papers & topics

Tools & resources

Завантажити слайди

Learn & explore

ВідеоНезабаром

Definition

An ensemble method trains a collection of base models and combines their predictions, for example by averaging or weighted voting; bagging-style ensembles reduce variance by averaging over randomized models, while boosting-style ensembles reduce bias by sequentially emphasizing previously misclassified examples.

Scope

This topic covers techniques that aggregate multiple learners: bagging and bootstrap aggregation, random forests that randomize both data and features, and boosting methods such as AdaBoost and gradient boosting that fit models sequentially to correct prior errors. It addresses why ensembles reduce error, the bias-variance effects of averaging versus boosting, and the role of model diversity.

Core questions

Why does combining many models often beat the best single model?
How do bagging and boosting differ in what error they reduce?
What role does diversity among base learners play?
How does gradient boosting fit additive models stage by stage?

Key theories

Bagging and variance reduction: Averaging predictions of models trained on bootstrap resamples reduces variance without much increasing bias, which is most effective for unstable, high-variance base learners such as deep decision trees.
Random forests: Random forests build many decorrelated trees by resampling data and randomly restricting the features considered at each split, yielding a robust, accurate ensemble with built-in estimates of error and feature importance.
Boosting as additive modeling: Boosting fits base learners sequentially, each correcting the residual errors of the current ensemble, which can be understood as stagewise minimization of a loss function and tends to reduce bias.

Clinical relevance

Tree-based ensembles, especially random forests and gradient-boosted trees, are among the most reliably accurate methods for tabular data and routinely win prediction competitions and power industrial systems; their built-in measures of feature importance also make them useful for understanding which inputs drive a prediction.

History

Bagging was introduced by Breiman in 1996, and AdaBoost by Freund and Schapire shortly after demonstrated that weak learners could be boosted into strong ones. Breiman's random forests in 2001 and Friedman's gradient boosting machines unified and extended these ideas, making ensembles the standard approach for structured prediction tasks.

Key figures

Leo Breiman
Robert Schapire
Yoav Freund
Jerome Friedman

Seminal works

breiman2001
hastie2009
freund1997

Frequently asked questions

What is the difference between bagging and boosting?: Bagging trains base models independently on resampled data and averages them to reduce variance. Boosting trains models sequentially, with each new model focusing on the errors of the current ensemble, which reduces bias. Bagging parallelizes naturally; boosting is inherently sequential.
Why do random forests rarely overfit badly?: Each tree is grown on a different bootstrap sample using a random subset of features, so the trees are decorrelated. Averaging many decorrelated trees cancels much of their individual variance, and adding more trees does not increase overfitting.