High-Dimensional Differentially Private Quantile Regression: Distributed Estimation and Statistical Inference
By: Ziliang Shen , Caixing Wang , Shaoli Wang and more
Potential Business Impact:
Keeps your private data safe when analyzing it.
With the development of big data and machine learning, privacy concerns have become increasingly critical, especially when handling heterogeneous datasets containing sensitive personal information. Differential privacy provides a rigorous framework for safeguarding individual privacy while enabling meaningful statistical analysis. In this paper, we propose a differentially private quantile regression method for high-dimensional data in a distributed setting. Quantile regression is a powerful and robust tool for modeling the relationships between the covariates and responses in the presence of outliers or heavy-tailed distributions. To address the computational challenges due to the non-smoothness of the quantile loss function, we introduce a Newton-type transformation that reformulates the quantile regression task into an ordinary least squares problem. Building on this, we develop a differentially private estimation algorithm with iterative updates, ensuring both near-optimal statistical accuracy and formal privacy guarantees. For inference, we further propose a differentially private debiased estimator, which enables valid confidence interval construction and hypothesis testing. Additionally, we propose a communication-efficient and differentially private bootstrap for simultaneous hypothesis testing in high-dimensional quantile regression, suitable for distributed settings with both small and abundant local data. Extensive simulations demonstrate the robustness and effectiveness of our methods in practical scenarios.
Similar Papers
Decentralized Quantile Regression for Feature-Distributed Massive Datasets with Privacy Guarantees
Computation
Protects private data while learning from many computers.
DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
Machine Learning (CS)
Protects user data while making AI faster.
Generalized random forest for extreme quantile regression
Methodology
Predicts rare weather events more accurately.