Score: 2

Are We Truly Forgetting? A Critical Re-examination of Machine Unlearning Evaluation Protocols

Published: March 10, 2025 | arXiv ID: 2503.06991v2

By: Yongwoo Kim, Sungmin Cha, Donghyun Kim

Potential Business Impact:

Makes AI forget specific information safely.

Business Areas:
Machine Learning Artificial Intelligence, Data and Analytics, Software

Machine unlearning is a process to remove specific data points from a trained model while maintaining the performance on retain data, addressing privacy or legal requirements. Despite its importance, existing unlearning evaluations tend to focus on logit-based metrics (i.e., accuracy) under small-scale scenarios. We observe that this could lead to a false sense of security in unlearning approaches under real-world scenarios. In this paper, we conduct a new comprehensive evaluation that employs representation-based evaluations of the unlearned model under large-scale scenarios to verify whether the unlearning approaches genuinely eliminate the targeted forget data from the model's representation perspective. Our analysis reveals that current state-of-the-art unlearning approaches either completely degrade the representational quality of the unlearned model or merely modify the classifier (i.e., the last layer), thereby achieving superior logit-based evaluation metrics while maintaining significant representational similarity to the original model. Furthermore, we introduce a rigorous unlearning evaluation setup, in which the forgetting classes exhibit semantic similarity to downstream task classes, necessitating that feature representations diverge significantly from those of the original model, thus enabling a more rigorous evaluation from a representation perspective. We hope our benchmark serves as a standardized protocol for evaluating unlearning algorithms under realistic conditions.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
18 pages

Category
Computer Science:
Machine Learning (CS)