Cohen's d and the Standardized Mean Difference

The magnitude of a difference between two means

Cohen's d expresses the difference between two group means in standard-deviation units, making effect magnitude comparable across studies that use different scales or measures. While statistical significance tells us whether a difference exists, Cohen's d tells us how large that difference is. Cohen's rough benchmarks of 0.2 small, 0.5 medium, and 0.8 large are conventions and starting points, not universal laws, and must always be interpreted in the context of the research domain.

The Concept and Formula

Cohen's d is a standardized effect-size measure obtained by dividing the difference between two group means by their pooled standard deviation. The formula can be written as d = (M1 - M2) / SD_pooled, where SD_pooled is the weighted average of the two groups' standard deviations. The result is a dimensionless number, which makes direct comparison across studies that use different tests, scales, or units possible. In small samples, d tends to slightly overestimate the true population value, which is why corrected estimators such as Hedges' g are preferred in those situations.

Computing and Reporting

To compute Cohen's d, first obtain the mean and standard deviation for each group, then calculate the pooled standard deviation. In R, the effsize or effectsize packages handle this automatically; in Python, pingouin or scipy provide ready-made functions. Reporting d alone is insufficient: always accompany it with a ninety-five percent confidence interval. A standard reporting format looks like: Cohen's d = 0.45, 95 percent CI [0.21, 0.69]. When Hedges' g is used instead, state this explicitly. In meta-analyses, always specify the effect-size type and the variance estimation method.

Common Misconceptions

The most common error is treating Cohen's benchmarks as universal thresholds. These values emerged from his survey of the behavioral-science literature and he himself emphasized they depend on context. A second mistake is equating a large d with practical importance: a big effect size can appear on a trivial variable. A third misconception is conflating statistical significance with effect size: a small d can be statistically significant in a large sample, while a large d may not reach significance in a small one. Finally, d is interpreted symmetrically; a negative sign simply reflects which group has the higher mean, not the direction of importance.

Why It Matters and How to Use It

Cohen's d provides the magnitude information that a p-value alone cannot convey, and it forms the foundation of meta-analysis, power analysis, and sample-size planning. A researcher designing a future study can use d values from the literature to estimate the required sample size. When field-specific norms are known, these should be preferred over generic benchmarks; in educational research, for example, a d of 0.4 may already represent a meaningful intervention effect. Presenting effect size alongside a confidence interval and contextual interpretation makes your findings both reproducible and practically meaningful.

Sources

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates. ISBN: 978-0-8058-0283-2