Does a 95% confidence interval mean there is a 95% chance the true value is inside it?

No. Under the frequentist definition the true value is fixed, and the 95% refers to the long-run proportion of such intervals, built the same way across repeated samples, that would contain it - not the probability for one specific interval.

What makes a confidence interval narrow or wide?

Mainly sample size and variability: larger samples and less variable data give narrower, more precise intervals, while small or noisy studies produce wide intervals that signal uncertainty.

Confidence Intervals

A confidence interval is a range of plausible values for an unknown population quantity, computed from sample data so that the procedure used to build it would capture the true value a stated proportion of the time - conventionally 95% - across repeated samples. It expresses both the magnitude of an estimate and the uncertainty around it in a single, widely reported summary, and has become the preferred way to present effect estimates in the health sciences.

Definition

A confidence interval is an interval, calculated from sample data by a defined method at a stated confidence level, such that the method would contain the true population parameter in that stated proportion of hypothetical repeated samples.

Scope

This topic explains what a confidence interval is, how its confidence level should be interpreted, how interval width reflects precision and sample size, and the common ways the concept is misunderstood. It is presented as a reference methodology for appraising and reporting research, not as a clinical decision rule.

Core questions

What range of values for the parameter is plausibly consistent with the data?
What does the confidence level actually guarantee?
How do sample size and variability determine interval width?
How does a confidence interval relate to a hypothesis test or p-value?

Key concepts

Confidence level
Coverage probability
Interval width and precision
Lower and upper confidence limits
Frequentist interpretation
Relation to the null value
Exact versus approximate intervals

Mechanisms

A confidence interval is typically formed by taking a point estimate and extending it by a multiple of its standard error, where the multiple is set by the desired confidence level and the relevant sampling distribution. The defining frequentist property is coverage: if the study were repeated many times, intervals built this way would contain the true parameter in the stated proportion of repetitions. The interval narrows as the sample grows or variability falls, so width is a direct readout of precision. A common shortcut links intervals to tests - if a 95% interval for a difference excludes the null value, the corresponding two-sided test is significant at the 5% level - but the interval conveys more by showing the whole range of compatible values. A frequent error is to read the level as the probability that the true value lies inside one particular interval, which the frequentist definition does not support.

Clinical relevance

Confidence intervals accompany most effect estimates in clinical and epidemiological reports, letting readers judge not just whether an effect is present but how large and how precisely estimated it is. A wide interval signals an inconclusive study even when a point estimate looks striking. This entry describes how intervals are constructed and interpreted and is not a basis for individual diagnostic or treatment decisions.

Evidence & guidelines

Reporting guidelines and editorial conventions in medicine now routinely expect effect estimates to be presented with confidence intervals. The American Statistical Association's statement on p-values and the misinterpretation guide by Greenland and colleagues both stress correct interpretation of intervals alongside p-values, building on Gardner and Altman's earlier advocacy for interval-based reporting.

History

The confidence interval was introduced by Jerzy Neyman in the 1930s as a frequentist approach to interval estimation, with early exact constructions such as the Clopper-Pearson limits for a binomial proportion appearing in 1934. Its routine use in medicine was driven later in the twentieth century, notably by Gardner and Altman's 1986 case for reporting intervals rather than bare p-values, which reshaped journal conventions.

Debates

Misinterpretation of the confidence level: The confidence level describes the long-run performance of the interval-building procedure, not the probability that a particular computed interval contains the true value; this distinction is widely misunderstood and a recurring source of error.

Key figures

Jerzy Neyman
Egon Pearson
Martin J. Gardner
Douglas G. Altman
Sander Greenland

Seminal works

clopper-pearson-1934
gardner-altman-1986

Frequently asked questions

Does a 95% confidence interval mean there is a 95% chance the true value is inside it?: No. Under the frequentist definition the true value is fixed, and the 95% refers to the long-run proportion of such intervals, built the same way across repeated samples, that would contain it - not the probability for one specific interval.
What makes a confidence interval narrow or wide?: Mainly sample size and variability: larger samples and less variable data give narrower, more precise intervals, while small or noisy studies produce wide intervals that signal uncertainty.