Geometric-Disentangelment Unlearning
By: Duo Zhou , Yuji Zhang , Tianxin Wei and more
Potential Business Impact:
Removes data from AI without hurting its skills.
Machine unlearning, the removal of a training subset's influence from a deployed model, is critical for privacy preservation and model reliability, yet gradient ascent on forget samples often harms retained knowledge. Existing approaches face a persistent tradeoff between effective forgetting and preservation on the retain set. While previous methods provide useful heuristics, they often lack a formal analysis on how exactly forgetting updates harm retained knowledge, and whether the side effects can be removed with theoretical guarantees. To explore a theoretically sound and simple solution, we start from the first principle on how performance on the retain set is actually affected: a first-order analysis of the local change of the retain loss under small parameter updates during model training. We start from a crisp equivalence: the retain loss is unchanged to first order iff the update direction is orthogonal to the subspace spanned by retain gradients ("retain-invariant"). This identifies the entangled component as the tangential part of forget update within the retain-gradient subspace, and characterizes disentanglement as orthogonality. Guided by this, we propose the Geometric-disentanglement Unlearning (GU) that decomposes any candidate forget gradient update into tangential and normal components to retain space and executes only the normal component. Under a standard trust-region budget, the projected direction aligned with the raw forget gradient is optimal among all first-order retain-invariant moves, and we also derive the optimal projected direction for joint forget-retain updating objectives. Our method is plug-and-play and can be attached to existing gradient-based unlearning procedures to mitigate side effects. GU achieves consistent improvement on various methods across three benchmarks TOFU, MUSE, and WMDP.
Similar Papers
UNO: Unlearning via Orthogonalization in Generative models
Machine Learning (CS)
Removes bad data from AI without retraining.
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
Computation and Language
Removes bad info from AI, keeps good info.
GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models
Machine Learning (CS)
Keeps AI smart while removing bad info.