CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage
By: Na Li , Yansong Gao , Hongsheng Hu and more
Potential Business Impact:
Finds hidden personal info in smaller AI models.
Model compression is crucial for minimizing memory storage and accelerating inference in deep learning (DL) models, including recent foundation models like large language models (LLMs). Users can access different compressed model versions according to their resources and budget. However, while existing compression operations primarily focus on optimizing the trade-off between resource efficiency and model performance, the privacy risks introduced by compression remain overlooked and insufficiently understood. In this work, through the lens of membership inference attack (MIA), we propose CompLeak, the first privacy risk evaluation framework examining three widely used compression configurations that are pruning, quantization, and weight clustering supported by the commercial model compression framework of Google's TensorFlow-Lite (TF-Lite) and Facebook's PyTorch Mobile. CompLeak has three variants, given available access to the number of compressed models and original model. CompLeakNR starts by adopting existing MIA methods to attack a single compressed model, and identifies that different compressed models influence members and non-members differently. When the original model and one compressed model are available, CompLeakSR leverages the compressed model as a reference to the original model and uncovers more privacy by combining meta information (e.g., confidence vector) from both models. When multiple compressed models are available with/without accessing the original model, CompLeakMR innovatively exploits privacy leakage info from multiple compressed versions to substantially signify the overall privacy leakage. We conduct extensive experiments on seven diverse model architectures (from ResNet to foundation models of BERT and GPT-2), and six image and textual benchmark datasets.
Similar Papers
How Quantization Impacts Privacy Risk on LLMs for Code?
Software Engineering
Makes AI code tools safer by hiding private data.
Evaluating the Dynamics of Membership Privacy in Deep Learning
Machine Learning (CS)
Protects private data used to train AI.
Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code
Software Engineering
Makes AI code checkers less safe when smaller.