Confidence Intervals
Interval estimation and its correct interpretation
A confidence interval expresses a parameter estimate as a range of plausible values rather than a single point. A 95% confidence interval means that, over many repeated samples, 95% of the intervals constructed by the same procedure would contain the true parameter. This confidence is a property of the procedure, not of any single computed interval. The width reflects estimation precision, while the location reflects the magnitude of the effect.
Core Idea and Definition
A confidence interval consists of two bounds derived from sample data. In classical (frequentist) statistics, a 95% confidence interval is defined as follows: if many independent samples were drawn from the same population and an interval were computed from each using the same procedure, approximately 95% of those intervals would contain the true parameter. For instance, the standard error is calculated as SE = σ/√n, and the interval for a mean is commonly written as x̄ ± 1.96 × SE. The confidence level (e.g., 95% or 99%) is chosen by the researcher before analysis.
How It Works and Key Distinctions
There is a direct duality between confidence intervals and two-sided hypothesis tests: any value outside a 95% confidence interval would be rejected at the α = 0.05 level of significance. The width of the interval reflects precision; as sample size grows, SE decreases and the interval narrows. Bayesian statistics uses credible intervals, which look similar but carry a genuinely probabilistic interpretation. Frequentist confidence intervals, by contrast, do not express probability about a single computed interval — they express long-run frequency behavior of the procedure.
Common Misuses and Misconceptions
The most common error is stating that a specific computed interval has a 95% probability of containing the true parameter. In the frequentist framework, the true parameter is a fixed but unknown constant; therefore, any particular computed interval either contains it or does not — probability does not apply to a single interval. Another misconception is assuming all values within the interval are equally supported; in fact, values near the center have stronger evidential support. A further mistake is concluding that non-overlapping confidence intervals necessarily indicate a statistically significant difference; a direct interval for the difference between two means is required.
Why It Matters in Research Practice
Confidence intervals provide information that a p-value alone cannot: they simultaneously convey the magnitude of an effect and the precision of the estimate. APA publication standards and many journals now require reporting confidence intervals rather than p-values alone. In clinical research, whether the entire interval falls within a region of clinical irrelevance can demonstrate that an intervention is practically ineffective. In meta-analysis, the overlap of intervals across studies aids visual assessment of effect consistency. For these reasons, confidence intervals are an indispensable component of modern evidence-based research.
Key thinkers
- Jerzy Neyman (1894–1981)Polish-American statistician who formalized the theory of confidence intervals in 1937 alongside the Neyman-Pearson framework for hypothesis testing.
Sources
- Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society A, 236(767), 333–380. DOI: 10.1098/rsta.1937.0005 ↗