Heterogeneity in Meta-Analysis
Heterogeneity in meta-analysis is the variation in true effects across the studies being pooled, beyond what sampling error alone would produce. Measuring and interpreting it tells the analyst whether the studies are estimating essentially the same thing or genuinely different things, which shapes both the model used and the confidence placed in the summary.
Definition
Heterogeneity is the degree to which the true effects estimated by individual studies in a meta-analysis differ from one another, quantified by statistics such as Cochran's Q, I-squared (the proportion of total variation due to between-study differences rather than chance), and tau-squared (the estimated between-study variance).
Scope
This entry covers the statistical assessment of between-study heterogeneity: the Cochran Q test, the I-squared statistic, the between-study variance tau-squared, and the known limitations of these measures. It treats heterogeneity as a methodological topic within evidence synthesis and offers reference description, not clinical advice.
Core questions
- Do the included studies estimate one common effect or a range of different effects?
- How much of the observed variation is real between-study difference versus sampling noise?
- How should I-squared and tau-squared be interpreted, and where do they mislead?
- When does heterogeneity make a single pooled estimate inappropriate?
Key concepts
- Cochran's Q test
- I-squared statistic
- Tau-squared (between-study variance)
- Clinical versus statistical heterogeneity
- Prediction interval
- Subgroup analysis as a response to heterogeneity
Mechanisms
The total variation among study estimates is partitioned into within-study sampling error and genuine between-study variation. Cochran's Q compares observed dispersion against what sampling error alone predicts; because Q has low power with few studies, Higgins and Thompson proposed I-squared, the percentage of total variation attributable to between-study heterogeneity rather than chance, which is independent of the number of studies. Tau-squared estimates the variance of the underlying effect distribution and feeds directly into random-effects weighting and prediction intervals. Important caveats follow: Rücker and colleagues show that I-squared depends on the precision of the included studies, so it can be large simply because studies are precise, and von Hippel shows it is unstable and can be biased in small meta-analyses, so these statistics must be read alongside the absolute spread of effects rather than against fixed thresholds.
Clinical relevance
Whether and how a body of trials is summarised depends heavily on its heterogeneity, so appraising heterogeneity statistics is part of judging how much weight a pooled result deserves in guidelines and health technology assessment. This entry describes how heterogeneity is measured and is not a basis for individual clinical decisions.
Evidence & guidelines
The Cochrane Handbook describes expected practice for assessing and reporting heterogeneity, including the use of I-squared with cautionary interpretation and the role of prediction intervals, consistent with the methodological literature summarised here.
History
Cochran's Q test for combining experiments dates from mid-twentieth-century statistics, but it proved underpowered for the small numbers of studies common in clinical meta-analysis. Higgins and Thompson's 2002 paper, followed by the widely cited 2003 BMJ exposition, introduced I-squared as an interpretable, sample-size-independent measure, after which a corrective literature (Rücker et al., 2008; von Hippel, 2015) clarified its dependence on study precision and its instability in small syntheses.
Debates
- How much should I-squared be relied on to judge heterogeneity?
- I-squared depends on the precision of the included studies and can be unstable when few studies are pooled, so commentators warn against fixed cut-offs and recommend reading it together with tau-squared and the absolute spread of effects.
Key figures
- Julian Higgins
- Simon Thompson
- Gerta Rücker
- Paul von Hippel
- William Cochran
Related topics
Seminal works
- higgins-thompson-2002
- higgins-2003
Frequently asked questions
- What does an I-squared of 75% mean?
- It indicates that about three-quarters of the total variation among study estimates reflects genuine between-study differences rather than sampling error; but because I-squared depends on study precision, it should be interpreted alongside the actual spread of effects, not against a fixed label.
- Is high heterogeneity a reason not to pool studies?
- Not automatically. High heterogeneity signals that studies differ and prompts investigation of why, but whether to pool, to use a random-effects model, or to refrain depends on whether the differences are explicable and the studies clinically comparable.