Causal Identification Strategies
Recovering causes from observational data
Researchers often cannot run experiments or randomize, yet they still ask causal questions. Causal identification strategies exploit structural features of observational data — natural experiments, cutoff rules, time variation — to rule out confounding. Instrumental variables, difference-in-differences, regression discontinuity, and matching each rest on specific assumptions that must be explicitly argued. These strategies do not prove causality; they approximate a causally valid interpretation under stated, sometimes untestable, conditions.
What Is Causal Identification?
Causality is different from correlation. Two variables may move together because one causes the other, because a common third variable drives both, or by chance alone. Causal identification refers to a set of analytic strategies that try to distinguish these possibilities. Randomized controlled experiments are the most reliable way to neutralize confounders by distributing them across groups at random. When randomization is impossible, researchers instead exploit specific structural features in available data — creating a setting that approximates a natural experiment — in order to recover a causal effect. Each strategy requires assumptions that must always be stated explicitly.
Four Core Strategies
Instrumental variables (IV): uses a factor that affects the exposure but has no direct effect on the outcome, creating variation that mimics randomization. Difference-in-differences (DiD): compares changes over time between a treated and a control group, removing fixed unobserved differences under a common trends assumption. Regression discontinuity (RD): compares units just above and just below a cutoff that determines assignment, assuming units near the threshold are otherwise similar. Matching and propensity scores: construct synthetic control groups that balance observed confounders, but offer no protection against unobserved confounders.
A Concrete Example: Difference-in-Differences
Suppose a government raises the minimum wage only in certain provinces. A researcher measures employment in both treated and control provinces before and after the policy. Simply comparing levels would be misleading because provinces may have different starting points. The difference-in-differences approach uses the control group as a guide to what would have happened over time: if the two groups followed similar trends before the intervention, any divergence in the treated group afterward can be attributed to the policy. Because the common trends assumption is unobservable, the researcher must support it visually and with pre-trend tests.
Common Pitfalls and Good Practice
The most common error is assuming rather than arguing the core assumptions. In instrumental variables, the exclusion restriction is unobservable and can only be defended on theoretical grounds. In regression discontinuity, bandwidth selection can substantially change results, making sensitivity checks across multiple bandwidths essential. Matching eliminates only observed confounders, so the researcher must be explicit about which variables were and were not measured. Good practice across all strategies requires: explicitly justifying the design choice, discussing assumptions directly, presenting transparent robustness checks, and honestly acknowledging the limits of the inference.
Key terms
- Instrumental Variable
- An external factor that affects the exposure but has no direct effect on the outcome.
- Difference-in-Differences
- A method that estimates causal effects by subtracting time and group differences under the common trends assumption.
- Regression Discontinuity
- A design that estimates causal effects by comparing units just on either side of an assignment cutoff.
- Exclusion Restriction
- The untestable assumption that an instrument affects the outcome only through the exposure channel.
- Common Trends Assumption
- The condition in difference-in-differences that both groups would have followed the same time trend absent treatment.
Further reading
- Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. ISBN: 978-0-691-12035-5