Are We Truly Forgetting? A Critical Re-examination of Machine Unlearning Evaluation Protocols
By: Yongwoo Kim, Sungmin Cha, Donghyun Kim
Potential Business Impact:
Makes AI forget specific information safely.
Machine unlearning is a process to remove specific data points from a trained model while maintaining the performance on retain data, addressing privacy or legal requirements. Despite its importance, existing unlearning evaluations tend to focus on logit-based metrics (i.e., accuracy) under small-scale scenarios. We observe that this could lead to a false sense of security in unlearning approaches under real-world scenarios. In this paper, we conduct a new comprehensive evaluation that employs representation-based evaluations of the unlearned model under large-scale scenarios to verify whether the unlearning approaches genuinely eliminate the targeted forget data from the model's representation perspective. Our analysis reveals that current state-of-the-art unlearning approaches either completely degrade the representational quality of the unlearned model or merely modify the classifier (i.e., the last layer), thereby achieving superior logit-based evaluation metrics while maintaining significant representational similarity to the original model. Furthermore, we introduce a rigorous unlearning evaluation setup, in which the forgetting classes exhibit semantic similarity to downstream task classes, necessitating that feature representations diverge significantly from those of the original model, thus enabling a more rigorous evaluation from a representation perspective. We hope our benchmark serves as a standardized protocol for evaluating unlearning algorithms under realistic conditions.
Similar Papers
Towards Reliable Forgetting: A Survey on Machine Unlearning Verification
Machine Learning (CS)
Proves computers forgot secret data correctly.
Existing Large Language Model Unlearning Evaluations Are Inconclusive
Machine Learning (CS)
Fixes how computers forget unwanted information.
Forget to Know, Remember to Use: Context-Aware Unlearning for Large Language Models
Computation and Language
Keeps AI smart but forgets bad info.