The Normal Distribution

The bell curve and z-scores

The normal distribution is a symmetric bell curve defined entirely by its mean and standard deviation. Approximately 68%, 95%, and 99.7% of values fall within 1, 2, and 3 standard deviations of the mean — the empirical rule. Standardizing an observation via z = (x − μ)/σ places it on the standard normal scale, enabling comparisons across different measurement scales. Thanks to the central limit theorem, sample means converge to normality in large samples, making the normal distribution the backbone of classical statistical inference.

Core Definition and Properties

The normal distribution is also called the Gaussian distribution, after Carl Friedrich Gauss who analysed it systematically. It is fully determined by just two parameters: the mean (μ) and the standard deviation (σ). The curve displays perfect symmetry so that the mean, median, and mode all coincide at the same point. As σ increases the curve spreads and flattens; as σ decreases it becomes taller and narrower. Theoretically the curve never touches the horizontal axis, extending across all real numbers, and its total area equals exactly 1.

The Empirical Rule and Standardization

The empirical rule (68–95–99.7 rule) summarises how observations distribute around the mean: approximately 68% of values fall within μ ± 1σ, 95% within μ ± 2σ, and 99.7% within μ ± 3σ. To convert any observation to a z-score, the formula z = (x − μ)/σ is applied, which rescales the observation to the standard normal distribution (Z ~ N(0,1)) — a normal distribution with mean 0 and standard deviation 1. Once on this common scale, probabilities can be read from a standard normal table or computed directly by statistical software.

Common Misconceptions

The most common misconception is assuming that all real-world data follow a normal distribution; in practice, income, reaction times, and many biological measurements are skewed or multimodal. A second misconception is that normality can be assumed for small samples: the central limit theorem guarantees that the sampling distribution of the mean approaches normality as sample size grows — not that individual observations are normal. A third error is applying the empirical rule percentages to any distribution; those specific values (68%, 95%, 99.7%) hold only for the normal distribution.

Importance in Research Practice

The normal distribution underpins the theoretical foundations of most parametric methods, including the t-test, ANOVA, and linear regression, all of which assume that residuals or sample means are normally distributed. Due to the central limit theorem, this assumption is often reasonably met in large samples. It is good practice for researchers to test the normality assumption using tools such as histograms, Q-Q plots, or the Shapiro-Wilk test, though these diagnostics are themselves sensitive to sample size. When normality cannot be established, applying a transformation or choosing a non-parametric alternative is the methodologically sounder course.

Sources

  1. Moore, D. S., McCabe, G. P., & Craig, B. A. (2017). Introduction to the Practice of Statistics (9th ed.). W. H. Freeman. ISBN: 978-1-319-01338-7