Interpreting p-values, Confidence Intervals, and Effect Sizes Together

Reading the whole picture, not one number

A p-value alone is a weak basis for conclusions. Read together, the three indicators tell a fuller story: the p-value indicates whether an observed effect is distinguishable from chance; the effect size states how large or meaningful that effect is; and the confidence interval reveals the precision of the estimate and the range of plausible values. In large samples, a tiny and unimportant effect can appear statistically significant; in small samples, a genuine and meaningful effect may fail to reach the significance threshold.

Defining the Three Indicators and Their Logic

The p-value expresses the probability of observing the data (or a more extreme result) if the null hypothesis were true; formally p = P(data | H0). Effect size (for example Cohen d = (M1 - M2) / SD_pooled, or Pearson r) captures the practical importance of the difference in a standardised, unit-free metric. The confidence interval reports the range of values within which the true parameter plausibly falls at a chosen confidence level, commonly 95%. These three indicators complement one another: any one of them, read in isolation, provides an incomplete picture.

How to Read and Interpret Them Together

When interpreting a finding, ask three questions in sequence: (1) Is the p-value small? If so, the effect is distinguishable from chance — but that alone is not enough. (2) How large is the effect size? A rough guide for Cohen d: 0.2 is small, 0.5 is medium, 0.8 is large. (3) How wide is the confidence interval, and in which direction does it point? A narrow interval signals high precision; an interval that includes zero signals uncertainty. For example, d = 0.15, p < 0.01, and 95% CI [0.05, 0.25] may be statistically significant yet practically negligible; d = 0.60, p = 0.08, and 95% CI [-0.05, 1.25] points to a potentially meaningful but imprecise effect.

Common Misconceptions and Misuses

The most common misconception is treating the p-value as a measure of the size of an effect or the probability that the result is true. In reality, the p-value is solely a sampling-distribution probability; it conveys no direct information about effect magnitude or the truth of the hypothesis. Another frequent error is treating the p < 0.05 threshold as a sharp boundary between real and spurious results. Beyond these, publishing only statistically significant findings (publication bias) inflates apparent effect sizes in the literature. The ASA (2016) statement explicitly warns: statistical significance alone does not establish scientific importance.

Best Practice in Reporting

The American Psychological Association (APA) and major statistical organisations recommend that researchers report the p-value, effect size, and confidence interval together for every key finding. A model sentence reads: t(58) = 2.34, p = 0.023, d = 0.61, 95% CI [0.09, 1.12]. This format communicates the significance decision, the practical importance, and the precision of the estimate in a single line. Shifting the language of interpretation away from significant or not significant and toward effect size and interval width improves the quality of scientific communication and guards against over-reliance on the threshold.

Sources

  1. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70(2), 129-133. DOI: 10.1080/00031305.2016.1154108