Directed Acyclic Graphs (DAGs)
Drawing causal assumptions explicitly
A directed acyclic graph (DAG) encodes causal assumptions as arrows between variables with no cycles allowed. By tracing paths on the DAG, a researcher identifies which variables are confounders that must be controlled, which are mediators, and which are colliders that should be left alone. This makes the assumptions behind any causal claim explicit and checkable, transforming intuition about confounding into a transparent, verifiable diagram.
What Is a DAG?
A DAG consists of nodes representing variables and directed arrows representing assumed causal effects. Directionality means causation flows in one direction only; acyclicity ensures no variable can cause itself through an indirect path, ruling out feedback loops within a single time-slice model. The presence or absence of an arrow encodes a researcher's prior beliefs about causal structure. By making these beliefs visible, a DAG allows them to be critiqued, tested, and revised — its primary epistemic value.
How to Read and Apply a DAG
A DAG contains three fundamental path types: front-door paths, back-door paths, and collider paths. The back-door criterion identifies the minimal set of variables that, when controlled, blocks all spurious back-door paths between treatment and outcome. A variable where two arrows meet in a v-structure is a collider; conditioning on it opens a previously closed path and induces bias. Controlling for a mediator blocks the causal pathway of interest. The DAG thus provides systematic, logic-based guidance on exactly which variables to include or exclude from adjustment.
A Concrete Example
Consider a researcher studying whether education level (X) affects wages (Y). Socioeconomic background (Z) may influence both education and wages, making it a confounder that must be controlled. However, occupation (M) lies on the causal path from education to wages; controlling for M blocks the indirect effect of interest. A DAG makes this structure explicit: condition on Z, not on M. Moving from intuition to a diagram forces the researcher to state these structural assumptions openly, making the analysis choices transparent and open to peer scrutiny.
Common Pitfalls and Good Practice
The most common mistake is including every measured variable as a covariate; this naive approach can open collider paths and amplify confounding. The DAG should be drawn before data analysis, grounded in theory and prior knowledge rather than empirical associations. When feedback loops are suspected, time-indexed variables in a repeated-measures structure are needed. A DAG also serves as a measurement audit: if an unmeasured confounder remains open, the researcher must explicitly acknowledge this limitation rather than claim full causal identification.
Key terms
- Confounder
- A variable that influences both treatment and outcome, creating a spurious association.
- Collider
- A node where two arrows meet; conditioning on it opens a new spurious path.
- Mediator
- An intermediate variable on the causal pathway from treatment to outcome.
- Back-Door Criterion
- A rule identifying the minimal variable set that blocks all spurious back-door paths.
- d-Separation
- A graphical criterion determining conditional independence between variable sets in a DAG.
Further reading
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press. ISBN: 978-0-521-89560-6