Why does normality matter?

Many common summaries (mean, standard deviation) and tests (t-test, ANOVA) assume approximately normal data; when that assumption fails, those measures can mislead and non-parametric or transformed alternatives may be more appropriate.

Is a significant Shapiro-Wilk test reason enough to abandon a parametric method?

Not on its own. The test becomes very sensitive in large samples and underpowered in small ones, so the size of the departure, the shape seen on a Q-Q plot, and the robustness of the planned analysis should all be considered.

Data Distribution and Normality

The distribution of a variable describes how its values are spread across the range of possibilities, and many descriptive and inferential methods depend on what that distribution looks like. Normality — whether data follow the symmetric, bell-shaped normal distribution — is the distributional assumption most often examined in health research, because it governs the choice between parametric and non-parametric summaries and tests.

Definition

A statistical distribution describes the relative frequency or probability of a variable's possible values; normality refers to conformity with the Gaussian (normal) distribution, a symmetric bell-shaped form assessed graphically and with formal tests to decide whether parametric methods are appropriate.

Scope

This entry covers distributional shape (symmetry, skewness, kurtosis), the normal distribution and why it matters, and how normality is assessed through graphical inspection and formal tests. It is a methodological reference and does not provide clinical guidance.

Core questions

What shape does the variable's distribution take, and is it symmetric or skewed?
Is the assumption of normality reasonable for this variable?
Which graphical and formal tools best assess normality, and how do they behave with small or large samples?

Key concepts

Normal (Gaussian) distribution
Skewness and kurtosis
Graphical assessment (histogram, Q-Q plot)
Shapiro-Wilk test
Kolmogorov-Smirnov test
Parametric versus non-parametric choice
Sample-size sensitivity of normality tests

Key theories

Central limit theorem: The central limit theorem states that, for a sufficiently large sample, the sampling distribution of the mean approaches a normal distribution regardless of the shape of the underlying variable. It is the reason normal-theory methods often remain serviceable for means even when the raw data are not normal.

Mechanisms

Normality is assessed in two complementary ways. Graphical methods — the histogram and the quantile-quantile (Q-Q) plot — show departures such as skew, heavy tails, or bimodality directly. Formal tests, of which the Shapiro-Wilk test is among the most widely used, return a probability of observing the data under a normal model. Because these tests gain power with sample size, they tend to flag trivial departures in large samples and miss meaningful ones in small samples, so graphical inspection and the practical consequences of non-normality are weighed alongside any test result. When the quantity of interest is a mean, the central limit theorem often justifies normal-theory methods even for non-normal raw data.

Clinical relevance

Whether a biomarker, length of stay, or score is treated as normal determines how it is summarised and analysed throughout the clinical literature, so judging normality is part of appraising a study's methods. This entry describes assessment of distributional assumptions and is not a basis for individual diagnostic or treatment decisions.

Epidemiology

Many biological and clinical measurements are right-skewed (for example, hormone levels, costs, and waiting times), so normality cannot be assumed and is routinely checked. The decision shapes whether results are reported with means and standard deviations or with medians and ranges, and whether parametric or non-parametric tests are used.

History

The normal distribution was developed in the eighteenth and nineteenth centuries in the work of de Moivre, Laplace, and Gauss, and became central to statistics through the theory of errors and the central limit theorem. Formal tools for checking the assumption followed in the twentieth century, with Shapiro and Wilk's 1965 analysis-of-variance test for normality becoming a standard procedure in applied work.

Debates

Should normality be judged by formal tests or by graphical inspection?: Formal normality tests are sensitive to sample size — rejecting trivial departures in large samples and failing to detect important ones in small samples — so many methodologists recommend that graphical assessment and the practical robustness of the planned analysis guide the decision rather than a test's p-value alone.

Key figures

Samuel S. Shapiro
Martin B. Wilk
Carl Friedrich Gauss

Seminal works

shapiro-wilk-1965
kwak-2017
ghasemi-2012

Frequently asked questions

Why does normality matter?: Many common summaries (mean, standard deviation) and tests (t-test, ANOVA) assume approximately normal data; when that assumption fails, those measures can mislead and non-parametric or transformed alternatives may be more appropriate.
Is a significant Shapiro-Wilk test reason enough to abandon a parametric method?: Not on its own. The test becomes very sensitive in large samples and underpowered in small ones, so the size of the departure, the shape seen on a Q-Q plot, and the robustness of the planned analysis should all be considered.