Does a posterior predictive p-value near 0.5 mean my model is correct?

No. Posterior predictive checks can reveal misfit in the features you test but cannot confirm that a model is correct; a non-extreme p-value only means the model is not contradicted by that particular test quantity.

Posterior Predictive Checks — ScholarGate Atlas

Definition

A posterior predictive check generates replicated data from a fitted model's posterior predictive distribution and compares features of these replications to the same features of the observed data, flagging systematic discrepancies as evidence of model misfit.

Scope

This topic covers the simulation of replicated datasets from the posterior predictive distribution, the use of test quantities and discrepancy measures, graphical checks, and posterior predictive p-values, along with their interpretation as a self-consistency rather than a hypothesis test.

Core questions

How are replicated datasets drawn from the posterior predictive distribution?
What are test quantities and discrepancy measures, and how are they chosen?
How is a posterior predictive p-value computed and interpreted?
Why is posterior predictive checking a check of fit rather than a model-selection rule?

Key concepts

posterior predictive distribution
replicated data
test quantity
discrepancy measure
posterior predictive p-value
graphical model checking

Key theories

Replicated-data comparison: If a model fits, data simulated from it should resemble the observed data in relevant respects; systematic differences in chosen test quantities reveal where the model fails.
Posterior predictive p-values: The posterior predictive p-value is the probability that a discrepancy measure for replicated data exceeds that for the observed data; it is a graphical and diagnostic tool, conservative and not a calibrated frequentist test.

Clinical relevance

Posterior predictive checks let analysts detect important model misfit before reporting conclusions, which matters in any applied Bayesian analysis where an inadequate model could mislead decisions.

History

Rubin proposed Bayesian predictive checking in 1984; Gelman, Meng, and Stern extended it with realized discrepancies depending on parameters in 1996. The approach has become standard practice in applied Bayesian workflows, often via graphical checks.

Debates

Double use of the data: Because the same data inform both the fitted model and the check, posterior predictive p-values are conservative and not uniformly distributed under the null, prompting alternatives such as cross-validated checks.

Key figures

Donald Rubin
Andrew Gelman
Xiao-Li Meng
Hal Stern

Seminal works

gelman1996
rubin1984

Frequently asked questions

Does a posterior predictive p-value near 0.5 mean my model is correct?: No. Posterior predictive checks can reveal misfit in the features you test but cannot confirm that a model is correct; a non-extreme p-value only means the model is not contradicted by that particular test quantity.

Posterior Predictive Checks