Missing Data Mechanisms

MCAR, MAR, MNAR

Missing values are inevitable in research data; what matters is why the data are missing. Statistical literature classifies missingness into three mechanisms: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). The mechanism in play directly determines which analytic strategy will yield valid results. Simple approaches such as listwise deletion appear convenient but often produce biased estimates; multiple imputation and maximum-likelihood methods are the modern standard and are statistically valid under MAR.

Core Concept: Why Does Missingness Matter?

The presence of missing values in a dataset does more than reduce sample size; more critically, it can systematically bias analytical results. What determines this is the mechanism by which the data came to be missing. The missing-data mechanism describes the relationship between the observed data, the unobserved data, and the process that governs which values are absent. Researchers must therefore first classify the mechanism correctly, then choose a method suited to that class. Any missing-data technique applied without correctly identifying the mechanism risks producing biased and unreliable estimates.

The Three Mechanisms: MCAR, MAR, and MNAR

MCAR (Missing Completely At Random): The probability that a value is missing is unrelated to both observed and unobserved variables; missingness is entirely by chance. MAR (Missing At Random): The probability of missingness can be explained by observed variables in the model, but not by the missing value itself. For example, older participants may skip a survey item more often; missingness is then explained by the observed age variable. MNAR (Missing Not At Random): The probability of missingness depends directly on the unobserved value itself. For instance, if high-income individuals are more likely to leave an income item blank, missingness is tied to the magnitude of the value that is absent.

Common Misconceptions and Misuses

The most common misconception is assuming that listwise deletion is always safe. Listwise deletion produces unbiased estimates only under MCAR; under MAR or MNAR, systematic bias is unavoidable. A second misconception is treating MAR and MCAR as equivalent; in fact, MAR is a substantially broader and more realistic assumption. MNAR, by its nature, cannot be verified from the observed data alone — it is inherently an untestable assumption. Single imputation methods such as mean substitution deflate standard errors, producing confidence intervals that are narrower than they should be, which is why they are not recommended in modern practice.

What to Do in Research Practice?

When the MAR assumption is reasonable, multiple imputation (MI) and full information maximum likelihood (FIML) are the statistically valid and preferred approaches. In multiple imputation, missing values are filled in several times to reflect estimation uncertainty, and analyses are combined across the imputed datasets, keeping standard errors accurate. Under MNAR, selection models or sensitivity analyses are required; no single method can guarantee unbiased results. In all cases, researchers should report the missing-data pattern and justify their chosen strategy — this is a requirement of transparent scientific reporting.

Sources

  1. Little, R. J. A., & Rubin, D. B. (2019). Statistical Analysis with Missing Data (3rd ed.). Wiley. ISBN: 978-0-470-52679-8