Machine learningPrivacy-preserving analysis

Synthetic Data Generation for Disclosure Control

Synthetic data generation is a statistical disclosure limitation technique introduced by Donald Rubin in 1993, in which values in a confidential dataset are replaced by draws from a fitted posterior predictive distribution rather than released directly. The resulting artificial records preserve the joint statistical structure of the original data while preventing the identification of real individuals, enabling analysts to work with a publicly releasable dataset that behaves like the original for most inferential purposes.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Rubin, D. B. (1993). Statistical disclosure limitation. Journal of Official Statistics, 9(2), 461–468. link

Related methods

Referenced by

ScholarGateSynthetic Data Generation (Synthetic Data Generation for Disclosure Control). Retrieved 2026-06-04 from https://scholargate.app/en/privacy/synthetic-data-generation