Weighted Fourier Factorizations: Optimal Gaussian Noise for Differentially Private Marginal and Product Queries
By: Christian Janos Lebeda, Aleksandar Nikolov, Haohua Tang
Potential Business Impact:
Protects private data while still giving useful answers.
We revisit the task of releasing marginal queries under differential privacy with additive (correlated) Gaussian noise. We first give a construction for answering arbitrary workloads of weighted marginal queries, over arbitrary domains. Our technique is based on releasing queries in the Fourier basis with independent noise with carefully calibrated variances, and reconstructing the marginal query answers using the inverse Fourier transform. We show that our algorithm, which is a factorization mechanism, is exactly optimal among all factorization mechanisms, both for minimizing the sum of weighted noise variances, and for minimizing the maximum noise variance. Unlike algorithms based on optimizing over all factorization mechanisms via semidefinite programming, our mechanism runs in time polynomial in the dataset and the output size. This construction recovers results of Xiao et al. [Neurips 2023] with a simpler algorithm and optimality proof, and a better running time. We then extend our approach to a generalization of marginals which we refer to as product queries. We show that our algorithm is still exactly optimal for this more general class of queries. Finally, we show how to embed extended marginal queries, which allow using a threshold predicate on numerical attributes, into product queries. We show that our mechanism is almost optimal among all factorization mechanisms for extended marginals, in the sense that it achieves the optimal (maximum or average) noise variance up to lower order terms.
Similar Papers
Privately Estimating Black-Box Statistics
Cryptography and Security
Protects private data when using unknown computer programs.
Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions
Machine Learning (Stat)
Helps AI learn better by understanding how it works.
Identifiable factor analysis for mixed continuous and binary variables based on the Gaussian-Grassmann distribution
Methodology
Finds hidden patterns in mixed data.