UCD: Unlearning in LLMs via Contrastive Decoding
By: Vinith M. Suriyakumar, Ayush Sekhari, Ashia Wilson
Potential Business Impact:
Removes bad info from AI without breaking it.
Machine unlearning aims to remove specific information, e.g. sensitive or undesirable content, from large language models (LLMs) while preserving overall performance. We propose an inference-time unlearning algorithm that uses contrastive decoding, leveraging two auxiliary smaller models, one trained without the forget set and one trained with it, to guide the outputs of the original model using their difference during inference. Our strategy substantially improves the tradeoff between unlearning effectiveness and model utility. We evaluate our approach on two unlearning benchmarks, TOFU and MUSE. Results show notable gains in both forget quality and retained performance in comparison to prior approaches, suggesting that incorporating contrastive decoding can offer an efficient, practical avenue for unlearning concepts in large-scale models.
Similar Papers
Deep Contrastive Unlearning for Language Models
Computation and Language
Removes private info from AI without breaking it.
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
Computation and Language
Cleans unwanted info from AI without retraining.
Learning-Time Encoding Shapes Unlearning in LLMs
Computation and Language
Teaches computers to forget bad or wrong information.