Confounders, Colliders, and Mediators
Which variables to adjust for
In causal analysis, not every variable plays the same role. A confounder is a common cause of both exposure and outcome; adjusting for it removes bias. A mediator lies on the pathway from exposure to outcome; adjusting for it blocks part of the very effect you want to estimate. A collider is a common effect of two variables; adjusting for it opens a spurious association. Knowing which role a variable plays is essential for making correct adjustment decisions.
Three Core Roles: Definitions
In causal diagrams (DAGs), each variable occupies a distinct structural position in the relationship between exposure X and outcome Y. A confounder C has arrows pointing to both X and Y; it is a common cause. A mediator M sits on the path from X to Y: X affects Y through M. A collider K receives arrows from both X and Y; it is a common effect of the two variables. These three structures generate fundamentally different inferential problems and require different analytic strategies.
Adjustment Logic: When to Control and When Not To
For a confounder, including it in a regression or matching on it removes confounding bias. For a mediator, adjusting blocks the portion of the total causal effect that flows through it; this is an error if you want the total effect. For a collider, adjustment opens a previously closed path and creates an association that does not exist in the population; this is the mechanism behind Berkson bias. The guiding rule: close confounders, leave mediators open when estimating total effects, and never adjust for colliders.
Concrete Example: Education, Income, and Health
Consider the relationship between education X and health Y. Socioeconomic background affects both education and health, making it a confounder that should be controlled. Income M is a pathway through which education influences health; including it in the model blocks part of the effect if you want the total causal effect. Now suppose you study only hospitalized individuals: if hospitalization is a common effect of both education and health status, it is a collider. Restricting the sample to this condition opens a spurious association between education and health that does not exist in the general population.
Common Pitfalls and Good Practice
The most common error is deciding a variable's role by looking at statistical associations. Both a collider and a confounder may correlate with X and Y; the distinction can only be made from the causal structure in a DAG. A second error is automatically adjusting for mediators, which underestimates the causal effect and produces biased estimates. Good practice: clarify the research question, draw a DAG based on prior subject-matter knowledge, read each variable's role from that diagram, and only then make analytic decisions. Adjustment choices made without a DAG are little more than guesswork.
Key terms
- Confounder
- A common cause of both exposure and outcome; its presence creates confounding bias.
- Mediator
- A variable on the causal path from exposure to outcome; part of the mechanism.
- Collider
- A common effect of two variables; adjusting for it induces spurious association.
- Directed Acyclic Graph (DAG)
- A graph encoding assumed causal relationships among variables without cycles.
- Berkson Bias
- Selection bias arising in clinical samples from conditioning on a collider variable.
Further reading
- Greenland, S., Pearl, J., & Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10(1), 37-48. DOI: 10.1097/00001648-199901000-00008 ↗