Differentially Private High-dimensional Variable Selection via Integer Programming
By: Petros Prastakos, Kayhan Behdin, Rahul Mazumder
Potential Business Impact:
Makes private computer learning pick important clues.
Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale non-private sparse regression - known as Best Subset Selection (BSS) - with millions of variables in minutes. However, extending these algorithmic advances to the setting of Differential Privacy (DP) has remained largely unexplored. In this paper, we introduce two new pure differentially private estimators for sparse variable selection, levering modern MIP techniques. Our framework is general and applies broadly to problems like sparse regression or classification, and we provide theoretical support recovery guarantees in the case of BSS. Inspired by the exponential mechanism, we develop structured sampling procedures that efficiently explore the non-convex objective landscape, avoiding the exhaustive combinatorial search in the exponential mechanism. We complement our theoretical findings with extensive numerical experiments, using both least squares and hinge loss for our objective function, and demonstrate that our methods achieve state-of-the-art empirical support recovery, outperforming competing algorithms in settings with up to $p=10^4$.
Similar Papers
An Interactive Framework for Finding the Optimal Trade-off in Differential Privacy
Machine Learning (CS)
Finds best privacy for data without losing accuracy.
A Markov Decision Process for Variable Selection in Branch & Bound
Machine Learning (CS)
Teaches computers to solve hard problems faster.
Optimality and computational barriers in variable selection under dependence
Statistics Theory
Finds the best way to pick important data.