The Sampling Distribution

How a statistic varies across samples

A sampling distribution is the distribution of a statistic — such as the sample mean — computed across all possible samples of a given size drawn from a population. Its spread is measured by the standard error. Hypothesis tests and confidence intervals are statements about where an observed statistic falls within its sampling distribution; the central limit theorem describes the shape of that distribution. The sampling distribution is therefore the conceptual bridge that makes statistical inference possible.

The Core Idea and Definition

In practice a researcher collects only one sample, yet statistical inference requires thinking about all possible samples of the same size that could be drawn from the same population. The statistic of interest — say, the sample mean — takes a different value in each hypothetical sample. The distribution formed by all those values is the sampling distribution. It is the bridge connecting a single observed statistic to the population parameter; without it, hypothesis tests and confidence intervals have no logical foundation.

The Standard Error and How It Works

The spread of the sampling distribution is measured by the standard error (SE). For the sample mean, the standard error is the population standard deviation divided by the square root of the sample size: SE = SD / √n. This formula captures two key truths: as sample size increases, SE shrinks and estimates become more precise; as population variability increases (larger SD), sample means spread more widely. In practice, when the population SD is unknown, the sample SD is substituted to obtain the estimated standard error.

The Central Limit Theorem and the Shape of the Distribution

The central limit theorem provides a powerful guarantee about the shape of the sampling distribution: regardless of the population's shape, for sufficiently large samples (commonly n ≥ 30) the sampling distribution of the sample mean is approximately normal. This is enormously valuable because the population distribution is rarely known in practice. The normal shape allows researchers to compute z and t statistics, interpret p-values, and construct confidence intervals. Understanding this theorem clarifies why sample size choices matter so critically in research design.

Common Misconceptions and Importance in Research Practice

The most common confusion is conflating the sampling distribution with the distribution of individual observations in a sample (the sample distribution). These are distinct: the sample distribution describes raw data, while the sampling distribution describes how a statistic varies across samples. Another frequent error is assuming the central limit theorem applies to small samples — it does not. In research practice, grasping the sampling distribution is essential for understanding why statistics vary, how to interpret confidence intervals, and how to justify sample-size planning — making it foundational to all frequentist inference.

Sources

  1. Moore, D. S., McCabe, G. P., & Craig, B. A. (2017). Introduction to the Practice of Statistics (9th ed.). W. H. Freeman. ISBN: 978-1-319-01338-7