Missing Data and Attrition
Missing data are values that were intended to be collected but were not obtained, and attrition is the loss of participants over the course of a study, often through dropout or loss to follow-up. Both reduce the information available and, more seriously, can bias results when the chance of a value being missing is related to what that value would have been. Anticipating and limiting missingness at the design stage, and handling it appropriately in analysis, are essential to preserving a study's validity.
Definition
Missing data are intended observations that were not recorded, and attrition is the loss of enrolled participants during a study; their impact depends on the missingness mechanism, ranging from missing completely at random (unrelated to any data) through missing at random (explainable by observed data) to missing not at random (related to the unobserved value itself).
Scope
The entry covers the types of missingness (missing completely at random, at random, and not at random), the consequences of attrition for bias and power, prevention strategies built into design and conduct, and principled handling methods such as multiple imputation and the intention-to-treat approach. It is framed as a methodological reference and does not give clinical instructions.
Key concepts
- Missing completely at random (MCAR)
- Missing at random (MAR)
- Missing not at random (MNAR)
- Loss to follow-up and dropout
- Multiple imputation
- Intention-to-treat analysis
- Complete-case analysis and its biases
- Sensitivity analysis for missingness assumptions
Mechanisms
The threat from missing data depends on why values are missing. If missingness is unrelated to any data (MCAR), simple analyses lose precision but remain unbiased; if it can be fully explained by observed variables (MAR), methods such as multiple imputation can recover valid estimates by modeling the missing values from the observed ones; if it depends on the unobserved value itself (MNAR), no method can guarantee an unbiased result and conclusions hinge on untestable assumptions. Attrition that is related to treatment or prognosis can break the balance that randomization created, which is why intention-to-treat analysis keeps participants in their assigned groups and why prevention, rather than after-the-fact repair, is emphasized. Sensitivity analyses examine how conclusions change under different assumptions about the missingness.
Clinical relevance
Appraising how much was missing, why, and how it was handled is part of judging whether a study's results are trustworthy, because high or differential attrition can exaggerate or mask an effect. This entry describes research methodology for appraisal and is not a source of diagnostic or treatment guidance.
Evidence & guidelines
An expert panel convened for the U.S. Food and Drug Administration emphasized preventing missing data through trial design and conduct and cautioned against relying on any single analytic fix. Methodological guidance describes multiple imputation under the missing-at-random assumption and its pitfalls, and the intention-to-treat framework for trials with missing outcomes; reporting standards such as CONSORT require a participant flow diagram documenting losses. Surveys show that intention-to-treat is often inconsistently defined and applied in practice.
History
The modern framework was shaped by Rubin's 1970s formalization of missingness mechanisms and by Little and Rubin's subsequent work on statistical analysis with missing data, which introduced multiple imputation. As randomized trials matured, the intention-to-treat principle became central to handling dropout without breaking randomization. A 2010 U.S. National Research Council report and the associated FDA-commissioned panel later reframed missing data as primarily a problem of prevention by design rather than of post hoc statistical correction.
Debates
- Can multiple imputation rescue a study with substantial missing data?
- Multiple imputation gives valid inference when data are missing at random, but its validity rests on an assumption that cannot be verified from the data; when data are missing not at random it can mislead, so it is a tool to be used with sensitivity analysis rather than a guaranteed fix.
- How should intention-to-treat handle missing outcomes?
- Intention-to-treat keeps participants in their randomized groups to preserve balance, but when outcomes are missing it cannot be applied without assumptions about the missing values; how to combine the principle with imputation and sensitivity analysis remains a practical challenge.
Key figures
- Roderick Little
- Donald Rubin
- Ian White
- Jonathan Sterne
- Douglas Altman
Related topics
Seminal works
- little-2012-prevention
- sterne-2009-mi
- white-2011-itt
Frequently asked questions
- Why is the reason data are missing more important than how much is missing?
- Even a modest amount of missing data can bias results if the chance of being missing depends on the unobserved value, whereas data missing for reasons unrelated to the value mainly cost precision; the mechanism, not just the quantity, determines whether and how much bias arises.
- What is intention-to-treat analysis and why does it matter for attrition?
- Intention-to-treat analyzes participants in the groups to which they were randomized, regardless of what happened afterward, which preserves the balance created by randomization; it matters for attrition because excluding dropouts or analyzing only those who completed treatment can reintroduce the confounding that randomization removed.