Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models
By: Jamie Hayes , Ilia Shumailov , Christopher A. Choquette-Choo and more
Potential Business Impact:
Finds if AI learned your private words.
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.
Similar Papers
Membership Inference Attacks on Large-Scale Models: A Survey
Machine Learning (CS)
Finds if your private info trained AI.
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Machine Learning (CS)
Finds best ways to check if AI learned private info.
On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models
Machine Learning (CS)
Stops AI from accidentally sharing private secrets.