BAYESIAN INSIGHTS ON DISCLOSURE LIMITATION: MASK OR IMPUTE?
Statistical agencies seek to disseminate useful data while keeping low the risk of statistical confidentiality disclosure. Recognizing that reidentification of data is generally inadequate to protect its confidentiality against attack by a data snooper, agencies restrict the data they release for general use. Typically, these restricted data procedures have involved transformation or masking of the original, collected data through such devices as adding noise, topcoding, data swapping, and recoding. Recently, proposals have been put forth for the release of synthetic data, simulated from models constructed from the original data. This paper gives a framework for the comparison of masking and synthetic data as two approaches to disclosure limitation. Particular attention is paid to data utility and disclosure risk. Examples of instantiation of masking and of synthetic data construction are provided to illustrate the concepts. Particular attention is paid to data swapping. Insights drawn from the Bayesian paxadigm are provided.
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Resource Relation:
- Conference: Conference title not supplied, Conference location not supplied, Conference dates not supplied; Other Information: PBD: 1 Oct 2000
- Research Org:
- Los Alamos National Lab., Los Alamos, NM (US)
- Sponsoring Org:
- US Department of Energy (US)
- Country of Publication:
- United States
- 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; CONSTRUCTION; TRANSFORMATIONS; STATISTICS; DATA-FLOW PROCESSING
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.