The Central Limit Theorem
Why the sample mean tends toward a normal distribution
The Central Limit Theorem states that the distribution of the sum (or mean) of independent, identically distributed random variables with finite variance converges to a normal distribution as sample size grows, regardless of the shape of the underlying population. This makes it the theoretical cornerstone of normal-based statistical inference, including confidence intervals and hypothesis tests.
Core Idea and Definition
Let X₁, X₂, …, Xₙ be independent and identically distributed (i.i.d.) random variables with mean μ and finite variance σ² < ∞. The standardized form of the sample mean, (X̄ − μ) / (σ/√n), converges in distribution to the standard normal N(0,1) as n → ∞. In practical terms, the standard error of the mean is SE = σ/√n, which decreases as sample size grows. The theorem's power lies in imposing no assumption on the shape of the underlying population distribution.
How It Works: Conditions and Limits
The theorem requires three conditions: (1) Independence — observations must not influence one another; (2) Identical distribution — each observation must come from the same probability distribution; (3) Finite variance — σ² < ∞. For heavy-tailed distributions such as the Cauchy distribution, where the variance is undefined, the theorem does not apply. Strong serial dependence also breaks the standard form, though variants exist for time series settings. The Lyapunov and Lindeberg conditions provide more general formulations that relax the identical distribution requirement.
Common Misconceptions
The most frequent error is applying the theorem to individual observations: the CLT applies only to sample means (or sums), not to raw data. A second misconception is assuming that 'sufficiently large n' always equals 30; the required sample size depends on the skewness of the population and the desired precision — sometimes far larger samples are needed. A third misconception is concluding that the CLT automatically satisfies normality assumptions for any statistical test; some tests require that the population itself be normally distributed, which the CLT does not guarantee.
Significance in Research Practice
The Central Limit Theorem is the backbone of applied statistics. Constructing confidence intervals based on the z or t distribution, running one- and two-sample t-tests, and computing standard errors of regression coefficients all implicitly invoke the CLT. In fields such as social science, medicine, and engineering, researchers routinely measure processes that are not themselves normally distributed yet can still draw normal-based inferences about sample means. This flexibility makes the theorem one of the indispensable theoretical foundations of modern empirical research.
Key thinkers
- Pierre-Simon Laplace (1749–1827)French mathematician who formulated an early version of the theorem, showing that the sum of errors converges to a normal distribution for large samples.
- Aleksandr Lyapunov (1857–1918)Russian mathematician who in 1901 provided a rigorous proof of the theorem under the Lyapunov condition, relaxing the requirement of identical distributions.
Sources
- Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1 (3rd ed.). John Wiley & Sons. ISBN: 978-0-471-25708-0