Why sample with replacement?

Sampling with replacement lets each resample differ from the original while keeping the same size, mimicking the variability of drawing fresh samples from the population. Without replacement every resample would just be the original data reordered.

How many bootstrap resamples are needed?

A few hundred suffice for standard errors, but confidence intervals based on tail quantiles typically need a couple of thousand or more so that the extreme quantiles are estimated stably.

Bootstrap Methods

The bootstrap estimates the sampling distribution of a statistic by repeatedly drawing samples with replacement from the observed data and recomputing the statistic on each resample.

מציאת נושא עם PaperMindבקרובFind papers & topics

Tools & resources

הורדת מצגת

Learn & explore

וידאובקרוב

Definition

The bootstrap is a resampling method that approximates the sampling distribution of an estimator by the distribution of the estimator recomputed over many samples drawn with replacement from the empirical distribution of the data.

Scope

This topic covers the nonparametric bootstrap and the plug-in principle, parametric and smoothed variants, the construction of confidence intervals (percentile, basic, bias-corrected and accelerated, and bootstrap-t), bootstrap standard errors and bias estimates, and adaptations for regression and dependent data such as the block bootstrap. Limitations and consistency conditions are emphasized.

Core questions

How does sampling with replacement from the data approximate the true sampling distribution?
How are bootstrap standard errors and bias estimates computed?
What distinguishes percentile, bootstrap-t, and bias-corrected accelerated confidence intervals?
When is the bootstrap consistent, and how is it adapted to regression and dependent data?

Key concepts

Sampling with replacement
Empirical distribution
Bootstrap standard error
Percentile interval
Bias-corrected and accelerated interval
Block bootstrap

Key theories

Plug-in resampling: Replacing the population distribution by the empirical distribution and resampling from it yields a Monte Carlo approximation to the sampling distribution of a statistic, from which standard errors and biases follow.
Bootstrap confidence intervals: Quantiles of the bootstrap distribution give percentile intervals; refinements such as the bias-corrected and accelerated and bootstrap-t intervals improve coverage by correcting for bias and skewness in the estimator's distribution.

Clinical relevance

The bootstrap supplies standard errors and confidence intervals for estimators with no closed-form variance, such as medians, correlation coefficients, and complex model outputs, and is routinely used to quantify uncertainty in biostatistics, econometrics and machine learning.

History

Efron introduced the bootstrap in 1979 as a generalization of the jackknife; subsequent work developed refined confidence intervals, established consistency theory, and produced variants for regression, time series and other dependent-data settings.

Debates

When the bootstrap fails: The ordinary nonparametric bootstrap can be inconsistent for statistics governed by extreme values, for parameters on the boundary of the space, and under heavy dependence, prompting corrections such as the m-out-of-n bootstrap and subsampling.

Key figures

Bradley Efron
Robert Tibshirani
Anthony Davison
David Hinkley

Seminal works

efron1979
efron1993

Frequently asked questions

Why sample with replacement?: Sampling with replacement lets each resample differ from the original while keeping the same size, mimicking the variability of drawing fresh samples from the population. Without replacement every resample would just be the original data reordered.
How many bootstrap resamples are needed?: A few hundred suffice for standard errors, but confidence intervals based on tail quantiles typically need a couple of thousand or more so that the extreme quantiles are estimated stably.