Statistical Errors and Bias

Sampling error, non-sampling error, bias vs variance

Errors threatening measurement accuracy in research fall into two main categories: sampling error and non-sampling error. Sampling error is random variation arising from observing only a subset of a population; it shrinks as sample size increases. Non-sampling errors encompass coverage, nonresponse, measurement, and processing problems that larger samples do not reduce. Bias is systematic deviation from the true value, while variance is random spread around an estimate. The total survey error framework organizes all these error sources into a coherent structure for evaluation and reduction.

Core Concepts: What Is Error?

Statistical error refers to the difference between an estimate and the true value it is meant to represent. This difference decomposes into two components: bias and variance. Bias indicates that an estimate systematically deviates from the true value — the error accumulates in the same direction across repeated measurements. Variance measures how much estimates fluctuate randomly from sample to sample. Mean squared error (MSE) combines both components: MSE = Bias² + Variance. A good estimator minimizes both; in practice, however, reducing one often increases the other, a tension known as the bias-variance trade-off.

Sampling Error vs. Non-Sampling Error

Sampling error is the random fluctuation that arises from observing only a sample rather than the entire population. The standard error formula SE = σ/√n makes this relationship concrete: as sample size n increases, the error decreases. Non-sampling errors comprise coverage error (some population units missing from the sampling frame), nonresponse error (selected units not participating), measurement error (incorrect values due to questionnaire or instrument problems), and processing error. These errors do not shrink with larger samples; instead, they are controlled through careful study design and rigorous field protocols.

Common Sources of Bias

Selection bias arises when the sampling process produces a group that does not represent the target population; volunteer-based samples are a classic example. Measurement bias occurs when respondents systematically provide incorrect answers due to factors such as social desirability effects or poorly worded questions. A confounder is a third variable associated with both the independent and dependent variables; when uncontrolled, it distorts causal inference. Survivorship bias results from including only observable units in the analysis while excluding those that have failed or dropped out, leading to systematically optimistic conclusions.

Importance in Research Practice

The Total Survey Error (TSE) framework consolidates the error types described above under a single evaluative scheme, enabling researchers to allocate resources deliberately across error sources. A large sample does not automatically eliminate bias; millions of observations collected with a biased instrument can produce more misleading results than a small but unbiased sample. Reliable research therefore requires identifying and minimizing bias sources before seeking to increase sample size. Explicitly distinguishing error types in research reports — rather than reporting a single summary statistic — substantially increases the credibility and interpretability of findings.

Key thinkers

Robert M. Groves (1948–)One of the principal architects of the total survey error framework and co-author of foundational texts on survey methodology.

Sources

Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey Methodology (2nd ed.). John Wiley & Sons. ISBN: 978-0-470-46546-5