Why not just measure error on the data used to fit the model?

In-sample error is optimistic because the model has been tuned to that very data, so it understates error on new data. Cross-validation evaluates predictions on data the model did not see during fitting, giving a more honest estimate.

How many folds should I use?

Five or ten folds are common choices that balance bias and variance and keep computation manageable. Leave-one-out uses as many folds as observations, giving low bias but higher variance and greater cost.

Cross-Validation

Cross-validation estimates how well a model will predict new data by repeatedly fitting it on part of the sample and measuring its error on the held-out remainder.

Definition

Cross-validation is a resampling procedure that estimates the out-of-sample predictive error of a model by partitioning the data into complementary subsets, fitting on some subsets and evaluating prediction error on the others, and averaging over the partitions.

Scope

This topic covers leave-one-out and k-fold cross-validation, the validation-set and repeated cross-validation schemes, their use for model selection and tuning-parameter choice, the bias-variance trade-off in the error estimate, and pitfalls such as information leakage and the optimism of in-sample error. Its role in resampling-based assessment is emphasized.

Core questions

How does holding out data and predicting it estimate generalization error?
What trade-offs distinguish leave-one-out from k-fold cross-validation?
How is cross-validation used to select models and tune hyperparameters?
What practices, such as avoiding information leakage, are needed for valid estimates?

Key concepts

k-fold partitioning
Leave-one-out cross-validation
Validation set
Generalization error
Model selection
Information leakage

Key theories

Cross-validatory assessment: Fitting on one part of the data and evaluating on a disjoint part gives an estimate of prediction error that, averaged over folds, approximates the model's error on independent future data.
Bias-variance in the error estimate: Leave-one-out cross-validation is nearly unbiased but can have high variance, while k-fold with moderate k trades a small upward bias for lower variance, guiding the common choice of five or ten folds.

Clinical relevance

Cross-validation is the standard tool for choosing among models, tuning regularization and other hyperparameters, and reporting honest predictive performance; it is central to statistical learning and machine-learning practice across the data-driven sciences.

History

Cross-validatory ideas were formalized by Stone and Geisser in 1974 as a principled way to assess and choose predictive models; the explosive growth of statistical and machine learning made k-fold cross-validation a routine default for model evaluation.

Debates

Bias and variance of the cross-validation estimate: There is continuing discussion of how many folds to use and how to obtain valid uncertainty estimates for cross-validated error, since the folds overlap and the resulting error estimates are correlated.

Key figures

Mervyn Stone
Seymour Geisser
Trevor Hastie
Robert Tibshirani

Seminal works

stone1974
hastie2009

Frequently asked questions

Why not just measure error on the data used to fit the model?: In-sample error is optimistic because the model has been tuned to that very data, so it understates error on new data. Cross-validation evaluates predictions on data the model did not see during fitting, giving a more honest estimate.
How many folds should I use?: Five or ten folds are common choices that balance bias and variance and keep computation manageable. Leave-one-out uses as many folds as observations, giving low bias but higher variance and greater cost.