Subgroup and Sensitivity Analysis in Reviews
Exploring heterogeneity and testing robustness
When studies disagree in a systematic review, two key tools come into play. Subgroup analysis splits studies by characteristics such as population, dose, or design to ask whether effect sizes differ across groups. Sensitivity analysis re-runs the synthesis under different reasonable choices — excluding low-quality studies, switching statistical models — to test how robust the conclusions are. Both must be planned prospectively in the review protocol to avoid data-driven false findings.
Core Concepts: What Are Subgroup and Sensitivity Analyses?
Subgroup analysis partitions the studies within a primary meta-analysis according to a prespecified characteristic — such as participant age, disease severity, or intervention dose — and calculates a separate effect estimate for each subset. The goal is to explain the source of observed heterogeneity. Meta-regression extends this idea by modelling effect size against continuous or categorical study-level moderators. Sensitivity analysis asks a different question: would the conclusion change if a specific methodological decision had been made differently? The two approaches are complementary — one aims to explain heterogeneity, the other to verify the reliability of conclusions.
How to Conduct and Interpret These Analyses
In subgroup analysis, a separate forest plot is produced for each subset and a test of between-subgroup heterogeneity — such as a Q-test or I² comparison — is applied. A meaningful within-group versus between-group difference signals a moderator effect. In sensitivity analysis, researchers typically compare fixed-effect versus random-effects model results, estimates with and without publication-bias correction, and the pooled effect after removing a particular high-weight study. If results are consistent across scenarios, robustness is confirmed. Large deviations signal methodological fragility and must be reported transparently.
A Concrete Example
Suppose a pain management meta-analysis finds an overall moderate effect size but high heterogeneity with I-squared equal to 72 percent. In a prespecified subgroup analysis, researchers split studies by intervention type — pharmacological, exercise, psychological. The exercise subgroup shows I-squared dropping to 18 percent and a larger effect, indicating that intervention type explains a substantial portion of the heterogeneity. A subsequent sensitivity analysis excludes the single study rated at high risk of bias, and the summary effect changes negligibly, demonstrating that the conclusion does not depend on that study.
Common Pitfalls and Good Practice
The most common mistake is selecting subgroups after examining the data rather than prespecifying them, a practice that inflates false-positive rates through multiple comparisons and must be disclosed explicitly as exploratory. A second error is reporting subgroup differences directly without formally testing between-subgroup heterogeneity at the meta-analysis level rather than comparing individual studies. Good practice requires predefining all moderators and scenarios in the protocol, applying corrections for multiple comparisons, clearly distinguishing exploratory from confirmatory findings, and reporting negative sensitivity analyses alongside positive ones.
Key terms
- Subgroup Analysis
- Partitioning studies by a shared characteristic to compute a separate effect estimate per subset.
- Sensitivity Analysis
- Re-running the synthesis under varied methodological decisions to test how robust conclusions remain.
- Meta-Regression
- A statistical method modelling effect size against continuous or categorical study-level moderators.
- Moderator
- A study-level characteristic that explains variation in effect sizes across subgroups.
- Robustness
- The degree to which conclusions remain consistent when reasonable analytical choices are varied.