Overfitting and Underfitting
A model's ability to generalize
Overfitting occurs when a model memorizes noise and incidental patterns in the training data, losing the ability to generalize to new observations. Underfitting is the opposite extreme: the model is too simple to capture the true underlying relationship and performs poorly everywhere. Both conditions undermine predictive validity. Researchers detect these imbalances by comparing training and validation or test performance, then adjust model complexity accordingly.
Concept and Core Logic
A statistical or machine learning model learns a pattern from observed data. The goal is to describe not only the observed sample but also future data generated by the same process. When training error approaches zero while validation error rises, overfitting is present — the model has learned data plus noise. Conversely, when both training and validation errors are high, underfitting is indicated — the model lacks sufficient flexibility. The target is a balance between these extremes, commonly expressed as the bias-variance tradeoff: Expected Error = Bias² + Variance + Irreducible Error.
How to Detect and Interpret It
The primary diagnostic tool is the learning curve: training and validation performance, such as RMSE or accuracy, plotted against sample size or model complexity. In overfitting, the two curves diverge — training performance is excellent while validation performance deteriorates. In underfitting, both curves remain close but at a poor level. Cross-validation, for example k-fold CV, provides a more reliable estimate of validation performance than a single train-test split. The test set must not be used during hyperparameter selection; doing so effectively leaks test information into training.
Common Misconceptions
First misconception: high training accuracy is proof of success. Training accuracy alone can mask overfitting; validation or test accuracy must always be reported alongside it. Second misconception: large datasets make overfitting impossible. Highly parameterized models can overfit even on large datasets. Third misconception: simple models are always safe. Overly simple models miss the true relationship, leading to invalid conclusions through underfitting. Fourth misconception: regularization solves every problem. Regularization reduces overfitting when applied correctly, but its effect is limited if the model architecture is fundamentally misspecified.
Why It Matters and How to Report It
Overfitting inflates the apparent performance of predictive models, potentially misleading policy or clinical decisions. Underfitting conceals real effects, producing erroneous null findings. When reporting, researchers should clearly state the model selection procedure, including which hyperparameters were tuned and what cross-validation strategy was used, both training and validation or test metrics, and, if regularization was applied, its type and strength. Reporting only the best training metric, especially with small sample sizes, seriously undermines scientific reproducibility.
Sources
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer. ISBN: 978-0-387-84857-0