Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Private Tabular Survey Data Products through Synthetic Microdata Generation

Journal Article · · Journal of Survey Statistics and Methodology

We propose two synthetic microdata approaches to generate private tabular survey data products for public release. We adapt a pseudo posterior mechanism that downweights by-record likelihood contributions with weights ∈[0,1] based on their identification disclosure risks to producing tabular products for survey data. Our method applied to an observed survey database achieves an asymptotic global probabilistic differential privacy guarantee. Our two approaches synthesize the observed sample distribution of the outcome and survey weights, jointly, such that both quantities together possess a privacy guarantee. The privacy-protected outcome and survey weights are used to construct tabular cell estimates (where the cell inclusion indicators are treated as known and public) and associated standard errors to correct for survey sampling bias. Through a real data application to the Survey of Doctorate Recipients public use file and simulation studies motivated by the application, we demonstrate that our two microdata synthesis approaches to construct tabular products provide superior utility preservation as compared to the additive noise approach of the Laplace Mechanism. Moreover, our approaches allow the release of microdata to the public, enabling additional analyses at no extra privacy cost.

Research Organization:
Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
SC0014664
OSTI ID:
1982504
Journal Information:
Journal of Survey Statistics and Methodology, Vol. 10, Issue 3; ISSN 2325-0984
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English

References (10)

Uncertainty Estimation for Pseudo‐Bayesian Inference Under Complex Sampling journal June 2020
The Multiple Adaptations of Multiple Imputation journal December 2007
Fully Bayesian estimation under informative sampling journal January 2019
Bayesian estimation under informative sampling journal January 2016
General and Specific Utility Measures for Synthetic Data
  • Snoke, Joshua; Raab, Gillian M.; Nowok, Beata
  • Journal of the Royal Statistical Society Series A: Statistics in Society, Vol. 181, Issue 3 https://doi.org/10.1111/rssa.12358
journal March 2018
A new approach to weighting and inference in sample surveys journal September 2008
The use of sampling weights for survey data analysis journal September 1996
Bayesian Estimation Under Informative Sampling with Unattenuated Dependence journal March 2020
Some results on generalized difference estimation and generalized regression estimation for finite populations journal January 1976
A data- and workload-aware algorithm for range queries under differential privacy journal January 2014