Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning
By: Yixin Wan , Anil Ramakrishna , Kai-Wei Chang and more
Potential Business Impact:
Removes bad info without forgetting good knowledge.
Large Language Model (LLM) unlearning has recently gained significant attention, driven by the need to remove unwanted information, such as private, sensitive, or copyrighted content, from LLMs. However, conventional unlearning approaches indiscriminately update model parameters to forget all tokens in a target document, including common tokens (e.g., pronouns, prepositions, general nouns) that carry general knowledge. In this paper, we highlight that not every token needs forgetting. We propose Selective Unlearning (SU), which identifies a critical subset of tokens within the forgetting set that is relevant to the unwanted information, and unlearns only those tokens. Experiments on two benchmarks and six baseline unlearning algorithms demonstrate that SU not only achieves effective unlearning on the targeted forget data, but also significantly preserves the model's utility in the retaining set.
Similar Papers
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
Cryptography and Security
Makes AI forget bad things without breaking good things.
Not All Tokens Are Meant to Be Forgotten
Machine Learning (CS)
Removes bad memories from AI without losing good ones.
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Machine Learning (CS)
Removes bad info from AI, making it safer.