Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs
By: Jérémie Dentan, Davide Buscaldi, Sonia Vanier
Potential Business Impact:
Helps AI remember things more like humans do.
Verbatim memorization in Large Language Models (LLMs) is a multifaceted phenomenon involving distinct underlying mechanisms. We introduce a novel method to analyze the different forms of memorization described by the existing taxonomy. Specifically, we train Convolutional Neural Networks (CNNs) on the attention weights of the LLM and evaluate the alignment between this taxonomy and the attention weights involved in decoding. We find that the existing taxonomy performs poorly and fails to reflect distinct mechanisms within the attention blocks. We propose a new taxonomy that maximizes alignment with the attention weights, consisting of three categories: memorized samples that are guessed using language modeling abilities, memorized samples that are recalled due to high duplication in the training set, and non-memorized samples. Our results reveal that few-shot verbatim memorization does not correspond to a distinct attention mechanism. We also show that a significant proportion of extractable samples are in fact guessed by the model and should therefore be studied separately. Finally, we develop a custom visual interpretability technique to localize the regions of the attention weights involved in each form of memorization.
Similar Papers
Unveiling Over-Memorization in Finetuning LLMs for Reasoning Tasks
Computation and Language
Makes AI smarter without copying answers exactly.
LCMem: A Universal Model for Robust Image Memorization Detection
CV and Pattern Recognition
Finds if AI copied private images.
Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications
Computation and Language
Doctors' AI remembers patient secrets, good and bad.