Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework
By: Xiaoyu Luo , Yiyi Chen , Qiongxiu Li and more
Potential Business Impact:
AI doesn't remember your private info as much.
Large Language Models (LLMs) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We propose a principled revision of memorization evaluation for LLMs, arguing that PII leakage should be evaluated under low lexical cue conditions, where target PII cannot be reconstructed through prompt-induced generalization or pattern completion. We formalize Cue-Resistant Memorization (CRM) as a cue-controlled evaluation framework and a necessary condition for valid memorization evaluation, explicitly conditioning on prompt-target overlap cues. Using CRM, we conduct a large-scale multilingual re-evaluation of PII leakage across 32 languages and multiple memorization paradigms. Revisiting reconstruction-based settings, including verbatim prefix-suffix completion and associative reconstruction, we find that their apparent effectiveness is driven primarily by direct surface-form cues rather than by true memorization. When such cues are controlled for, reconstruction success diminishes substantially. We further examine cue-free generation and membership inference, both of which exhibit extremely low true positive rates. Overall, our results suggest that previously reported PII leakage is better explained by cue-driven behavior than by genuine memorization, highlighting the importance of cue-controlled evaluation for reliably quantifying privacy-relevant memorization in LLMs.
Similar Papers
Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models
Information Retrieval
Finds private info hidden in AI.
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
Cryptography and Security
Stops computers from accidentally sharing private secrets.
Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training
Computation and Language
AI learns private info better, even when it's removed.