Nonparametric Statistics
Nonparametric statistics draws inferences without assuming a particular parametric form for the underlying distribution, trading some efficiency for robustness and flexibility.
Definition
Nonparametric statistics is the body of methods for estimation and testing that assume only broad qualitative features of the data-generating distribution, such as continuity or smoothness, rather than a finite-dimensional parametric model.
Scope
This area covers distribution-free rank tests such as the sign, Wilcoxon, and Kruskal-Wallis tests, the empirical distribution function and its uniform convergence, nonparametric density and regression estimation by kernels, splines, and local methods, the bias-variance trade-off and bandwidth selection, minimax rates for smooth function classes, and resampling methods including the bootstrap and permutation tests that approximate sampling distributions from the data themselves.
Sub-topics
Core questions
- How do rank-based tests achieve validity without assuming a specific distribution?
- How are densities and regression functions estimated, and how is smoothing controlled?
- What is the bias-variance trade-off in smoothing, and how is the bandwidth chosen?
- How do the bootstrap and permutation methods approximate sampling distributions from data?
Key theories
- Distribution-free rank methods
- Replacing data values by their ranks yields test statistics whose null distribution does not depend on the underlying continuous distribution, giving valid tests under minimal assumptions.
- Smoothing and the bias-variance trade-off
- Kernel and spline estimators of densities and regression functions balance bias against variance through a bandwidth, and minimax theory gives the optimal rate for a given smoothness class.
- Resampling
- The bootstrap and permutation methods approximate the sampling distribution of a statistic by repeatedly resampling the observed data, providing standard errors, confidence intervals, and tests with few assumptions.
Clinical relevance
Nonparametric methods are indispensable when data are ordinal, skewed, or contaminated by outliers: rank tests are standard in clinical and ecological studies, kernel and spline smoothers describe dose-response and growth curves, and the bootstrap supplies confidence intervals when no formula exists.
History
Distribution-free rank tests emerged with Wilcoxon in 1945 and the Mann-Whitney and Kruskal-Wallis tests soon after. Density estimation developed through Rosenblatt and Parzen in the 1950s and 1960s, and Efron's 1979 bootstrap brought computer-intensive resampling to the center of the subject.
Key figures
- Frank Wilcoxon
- Bradley Efron
- Emanuel Parzen
- Larry Wasserman
Related topics
Seminal works
- wasserman2006
Frequently asked questions
- Are nonparametric methods always better because they assume less?
- No. Assuming less buys robustness but costs efficiency: when a parametric model is correct, parametric methods are more powerful, so nonparametric methods are preferred mainly when the model is in doubt.
- Does nonparametric mean there are no parameters at all?
- No. It means the model is not described by a fixed finite set of parameters; the target may be an entire function, such as a density or regression curve, which is effectively infinite-dimensional.