Score: 4

Differentially Private High-dimensional Variable Selection via Integer Programming

Published: October 24, 2025 | arXiv ID: 2510.22062v1

By: Petros Prastakos, Kayhan Behdin, Rahul Mazumder

BigTech Affiliations: Massachusetts Institute of Technology LinkedIn

Potential Business Impact:

Makes private computer learning pick important clues.

Business Areas:
A/B Testing Data and Analytics

Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale non-private sparse regression - known as Best Subset Selection (BSS) - with millions of variables in minutes. However, extending these algorithmic advances to the setting of Differential Privacy (DP) has remained largely unexplored. In this paper, we introduce two new pure differentially private estimators for sparse variable selection, levering modern MIP techniques. Our framework is general and applies broadly to problems like sparse regression or classification, and we provide theoretical support recovery guarantees in the case of BSS. Inspired by the exponential mechanism, we develop structured sampling procedures that efficiently explore the non-convex objective landscape, avoiding the exhaustive combinatorial search in the exponential mechanism. We complement our theoretical findings with extensive numerical experiments, using both least squares and hinge loss for our objective function, and demonstrate that our methods achieve state-of-the-art empirical support recovery, outperforming competing algorithms in settings with up to $p=10^4$.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
34 pages

Category
Statistics:
Machine Learning (Stat)