Bootstrap and Resampling
The bootstrap estimates the sampling distribution of a statistic by resampling the observed data, replacing intractable formulas with computation.
Definition
The bootstrap is a resampling method that approximates the sampling distribution of a statistic by treating the observed sample as the population and repeatedly drawing samples from it, usually with replacement, to estimate standard errors, confidence intervals, and bias.
Scope
This topic covers the nonparametric bootstrap by resampling with replacement, the parametric bootstrap, the jackknife and its bias and variance estimates, permutation tests, bootstrap standard errors and the percentile, bias-corrected, and bootstrap-t confidence intervals, the consistency of the bootstrap and its higher-order accuracy via Edgeworth expansions, and well-known cases such as the sample maximum where the ordinary bootstrap fails.
Core questions
- How does resampling the data approximate the sampling distribution of a statistic?
- How are bootstrap confidence intervals constructed, and how do percentile and bootstrap-t intervals differ?
- When is the bootstrap consistent, and when does it fail?
- How does a permutation test use resampling to obtain an exact distribution-free test?
Key theories
- The bootstrap principle
- Approximating the unknown population by the empirical distribution and resampling from it lets the sampling variability of almost any statistic be estimated by simulation, even when no closed-form distribution exists.
- Bootstrap consistency and accuracy
- For smooth statistics the bootstrap is consistent and, through Edgeworth expansions, certain bootstrap intervals are more accurate than the normal approximation; for non-smooth functionals such as the maximum it can fail.
Clinical relevance
The bootstrap supplies standard errors and confidence intervals for complex estimators, such as medians, correlations, and machine-learning predictions, where analytic formulas are unavailable, and permutation tests give exact significance assessments widely used in genomics and randomized experiments.
History
Quenouille and Tukey developed the jackknife in the 1950s. Efron introduced the bootstrap in 1979, unifying and extending these resampling ideas, and Hall's work in the 1980s and 1990s established its higher-order accuracy through Edgeworth expansions.
Key figures
- Bradley Efron
- Robert Tibshirani
- Peter Hall
- Maurice Quenouille
Related topics
Seminal works
- efron1979
Frequently asked questions
- Does the bootstrap create new information from nothing?
- No. It reuses the information already in the sample to approximate sampling variability; it cannot improve on a poor or biased sample, and its accuracy depends on the original sample representing the population well.
- When does the bootstrap fail?
- It can fail for statistics that depend non-smoothly on the distribution, such as the sample maximum or parameters on a boundary; in such cases modified schemes like subsampling or the m-out-of-n bootstrap are used instead.