skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The price of privately releasing contingency tables, and the spectra of random matrices with correlated rows

Abstract

Contingency tables are the method of choice of government agencies for releasing statistical summaries of categorical data. In this paper, we consider lower bounds on how much distortion (noise) is necessary in these tables to provide privacy guarantees when the data being summarized is sensitive. We extend a line of recent work on lower bounds on noise for private data analysis [10, 13. 14, 15] to a natural and important class of functionalities. Our investigation also leads to new results on the spectra of random matrices with correlated rows. Consider a database D consisting of n rows (one per individual), each row comprising d binary attributes. For any subset of T attributes of size |T| = k, the marginal table for T has 2{sup k} entries; each entry counts how many times in the database a particular setting of these attributes occurs. Imagine an agency that wishes to release all (d/k) contingency tables for a given database. For constant k, previous work showed that distortion {tilde {Omicron}}(min{l_brace}n, (n{sup 2}d){sup 1/3}, {radical}d{sup k}{r_brace}) is sufficient for satisfying differential privacy, a rigorous definition of privacy that has received extensive recent study. Our main contributions are: (1) For {epsilon}- and ({epsilon}, {delta})-differential privacymore » (with {epsilon} constant and {delta} = 1/poly(n)), we give a lower bound of {tilde {Omega}}(min{l_brace}{radical}n, {radical}d{sup k}{r_brace}), which is tight for n = {tilde {Omega}}(d{sup k}). Moreover, for a natural and popular class of mechanisms based on additive noise, our bound can be strengthened to {Omega}({radical}d{sup k}), which is tight for all n. Our bounds extend even to non-constant k, losing roughly a factor of {radical}2{sup k} compared to the best known upper bounds for large n. (2) We give efficient polynomial time attacks which allow an adversary to reconstruct sensitive infonnation given insufficiently perturbed contingency table releases. For constant k, we obtain a lower bound of {tilde {Omega}}(min{l_brace}{radical}n, {radical}d{sup k}{r_brace}) that applies to a large class of privacy notions, including K-anonymity (along with its variants) and differential privacy. In contrast to our bounds for differential privacy, this bound (a) is shown only for constant k, but (b) is tight for all values of n when k is constant. (3) Our reconstruction-based attacks require a new lower bound on the least singular values of random matrices with correlated rows. For a constant k, consider a matrix M with (d/k) rows which are formed by taking all possible k-way entry-wise products of an underlying set of d random vectors. We show that even for nearly square matrices with d{sup k}/log d columns, the least singular value is {Omega}({radical}d{sup k}) with high probability - asymptotically, the same bound as one gets for a matrix with independent rows. The proof requires several new ideas for analyzing random matrices and could be of independent interest.« less

Authors:
 [1];  [2];  [3]
  1. Los Alamos National Laboratory
  2. UNIV OF MISSOURI
  3. PENNSYLVANIA STATE U
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
990798
Report Number(s):
LA-UR-09-04593; LA-UR-09-4593
TRN: US201020%%609
DOE Contract Number:  
AC52-06NA25396
Resource Type:
Conference
Resource Relation:
Conference: Symposium on Theory of Computing 2010 ; June 6, 2010 ; Boston, MA
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; ADDITIVES; DATA ANALYSIS; MATRICES; POLYNOMIALS; PRICES; PROBABILITY; SPECTRA; VECTORS

Citation Formats

Kasiviswanathan, Shiva, Rudelson, Mark, and Smith, Adam. The price of privately releasing contingency tables, and the spectra of random matrices with correlated rows. United States: N. p., 2009. Web. doi:10.1145/1806689.1806795.
Kasiviswanathan, Shiva, Rudelson, Mark, & Smith, Adam. The price of privately releasing contingency tables, and the spectra of random matrices with correlated rows. United States. https://doi.org/10.1145/1806689.1806795
Kasiviswanathan, Shiva, Rudelson, Mark, and Smith, Adam. 2009. "The price of privately releasing contingency tables, and the spectra of random matrices with correlated rows". United States. https://doi.org/10.1145/1806689.1806795. https://www.osti.gov/servlets/purl/990798.
@article{osti_990798,
title = {The price of privately releasing contingency tables, and the spectra of random matrices with correlated rows},
author = {Kasiviswanathan, Shiva and Rudelson, Mark and Smith, Adam},
abstractNote = {Contingency tables are the method of choice of government agencies for releasing statistical summaries of categorical data. In this paper, we consider lower bounds on how much distortion (noise) is necessary in these tables to provide privacy guarantees when the data being summarized is sensitive. We extend a line of recent work on lower bounds on noise for private data analysis [10, 13. 14, 15] to a natural and important class of functionalities. Our investigation also leads to new results on the spectra of random matrices with correlated rows. Consider a database D consisting of n rows (one per individual), each row comprising d binary attributes. For any subset of T attributes of size |T| = k, the marginal table for T has 2{sup k} entries; each entry counts how many times in the database a particular setting of these attributes occurs. Imagine an agency that wishes to release all (d/k) contingency tables for a given database. For constant k, previous work showed that distortion {tilde {Omicron}}(min{l_brace}n, (n{sup 2}d){sup 1/3}, {radical}d{sup k}{r_brace}) is sufficient for satisfying differential privacy, a rigorous definition of privacy that has received extensive recent study. Our main contributions are: (1) For {epsilon}- and ({epsilon}, {delta})-differential privacy (with {epsilon} constant and {delta} = 1/poly(n)), we give a lower bound of {tilde {Omega}}(min{l_brace}{radical}n, {radical}d{sup k}{r_brace}), which is tight for n = {tilde {Omega}}(d{sup k}). Moreover, for a natural and popular class of mechanisms based on additive noise, our bound can be strengthened to {Omega}({radical}d{sup k}), which is tight for all n. Our bounds extend even to non-constant k, losing roughly a factor of {radical}2{sup k} compared to the best known upper bounds for large n. (2) We give efficient polynomial time attacks which allow an adversary to reconstruct sensitive infonnation given insufficiently perturbed contingency table releases. For constant k, we obtain a lower bound of {tilde {Omega}}(min{l_brace}{radical}n, {radical}d{sup k}{r_brace}) that applies to a large class of privacy notions, including K-anonymity (along with its variants) and differential privacy. In contrast to our bounds for differential privacy, this bound (a) is shown only for constant k, but (b) is tight for all values of n when k is constant. (3) Our reconstruction-based attacks require a new lower bound on the least singular values of random matrices with correlated rows. For a constant k, consider a matrix M with (d/k) rows which are formed by taking all possible k-way entry-wise products of an underlying set of d random vectors. We show that even for nearly square matrices with d{sup k}/log d columns, the least singular value is {Omega}({radical}d{sup k}) with high probability - asymptotically, the same bound as one gets for a matrix with independent rows. The proof requires several new ideas for analyzing random matrices and could be of independent interest.},
doi = {10.1145/1806689.1806795},
url = {https://www.osti.gov/biblio/990798}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2009},
month = {Thu Jan 01 00:00:00 EST 2009}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: