The price of privately releasing contingency tables, and the spectra of random matrices with correlated rows
- Los Alamos National Laboratory
- UNIV OF MISSOURI
- PENNSYLVANIA STATE U
Contingency tables are the method of choice of government agencies for releasing statistical summaries of categorical data. In this paper, we consider lower bounds on how much distortion (noise) is necessary in these tables to provide privacy guarantees when the data being summarized is sensitive. We extend a line of recent work on lower bounds on noise for private data analysis [10, 13. 14, 15] to a natural and important class of functionalities. Our investigation also leads to new results on the spectra of random matrices with correlated rows. Consider a database D consisting of n rows (one per individual), each row comprising d binary attributes. For any subset of T attributes of size |T| = k, the marginal table for T has 2{sup k} entries; each entry counts how many times in the database a particular setting of these attributes occurs. Imagine an agency that wishes to release all (d/k) contingency tables for a given database. For constant k, previous work showed that distortion {tilde {Omicron}}(min{l_brace}n, (n{sup 2}d){sup 1/3}, {radical}d{sup k}{r_brace}) is sufficient for satisfying differential privacy, a rigorous definition of privacy that has received extensive recent study. Our main contributions are: (1) For {epsilon}- and ({epsilon}, {delta})-differential privacy (with {epsilon} constant and {delta} = 1/poly(n)), we give a lower bound of {tilde {Omega}}(min{l_brace}{radical}n, {radical}d{sup k}{r_brace}), which is tight for n = {tilde {Omega}}(d{sup k}). Moreover, for a natural and popular class of mechanisms based on additive noise, our bound can be strengthened to {Omega}({radical}d{sup k}), which is tight for all n. Our bounds extend even to non-constant k, losing roughly a factor of {radical}2{sup k} compared to the best known upper bounds for large n. (2) We give efficient polynomial time attacks which allow an adversary to reconstruct sensitive infonnation given insufficiently perturbed contingency table releases. For constant k, we obtain a lower bound of {tilde {Omega}}(min{l_brace}{radical}n, {radical}d{sup k}{r_brace}) that applies to a large class of privacy notions, including K-anonymity (along with its variants) and differential privacy. In contrast to our bounds for differential privacy, this bound (a) is shown only for constant k, but (b) is tight for all values of n when k is constant. (3) Our reconstruction-based attacks require a new lower bound on the least singular values of random matrices with correlated rows. For a constant k, consider a matrix M with (d/k) rows which are formed by taking all possible k-way entry-wise products of an underlying set of d random vectors. We show that even for nearly square matrices with d{sup k}/log d columns, the least singular value is {Omega}({radical}d{sup k}) with high probability - asymptotically, the same bound as one gets for a matrix with independent rows. The proof requires several new ideas for analyzing random matrices and could be of independent interest.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC52-06NA25396
- OSTI ID:
- 990798
- Report Number(s):
- LA-UR-09-04593; LA-UR-09-4593; TRN: US201020%%609
- Resource Relation:
- Conference: Symposium on Theory of Computing 2010 ; June 6, 2010 ; Boston, MA
- Country of Publication:
- United States
- Language:
- English
Similar Records
Finding cycles and trees in sublinear time.
A low-storage filter diagonalization method for quantum eigenenergy calculation or for spectral analysis of time signals