FusionDP: Foundation Model-Assisted Differentially Private Learning for Partially Sensitive Features
By: Linghui Zeng , Ruixuan Liu , Atiquer Rahman Sarkar and more
Potential Business Impact:
Protects private health info while improving AI.
Ensuring the privacy of sensitive training data is crucial in privacy-preserving machine learning. However, in practical scenarios, privacy protection may be required for only a subset of features. For instance, in ICU data, demographic attributes like age and gender pose higher privacy risks due to their re-identification potential, whereas raw lab results are generally less sensitive. Traditional DP-SGD enforces privacy protection on all features in one sample, leading to excessive noise injection and significant utility degradation. We propose FusionDP, a two-step framework that enhances model utility under feature-level differential privacy. First, FusionDP leverages large foundation models to impute sensitive features given non-sensitive features, treating them as external priors that provide high-quality estimates of sensitive attributes without accessing the true values during model training. Second, we introduce a modified DP-SGD algorithm that trains models on both original and imputed features while formally preserving the privacy of the original sensitive features. We evaluate FusionDP on two modalities: a sepsis prediction task on tabular data from PhysioNet and a clinical note classification task from MIMIC-III. By comparing against privacy-preserving baselines, our results show that FusionDP significantly improves model performance while maintaining rigorous feature-level privacy, demonstrating the potential of foundation model-driven imputation to enhance the privacy-utility trade-off for various modalities.
Similar Papers
Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments
Machine Learning (CS)
Protects private data when computers learn together.
Differential Privacy for Deep Learning in Medicine
Machine Learning (CS)
Keeps patient data safe while training AI.
Graph Structure Learning with Privacy Guarantees for Open Graph Data
Machine Learning (CS)
Keeps private info safe when sharing data.