Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning
By: Naixin Zhai , Pengyang Shao , Binbin Zheng and more
Potential Business Impact:
Makes AI forget bad info without losing skills.
Machine unlearning aims to forget sensitive knowledge from Large Language Models (LLMs) while maintaining general utility. However, existing approaches typically treat all tokens in a response indiscriminately and enforce uncertainty over the entire vocabulary. This global treatment results in unnecessary utility degradation and extends optimization to content-agnostic regions. To address these limitations, we propose PALU (Prefix-Aware Localized Unlearning), a framework driven by a local entropy maximization objective across both temporal and vocabulary dimensions. PALU reveals that (i) suppressing the sensitive prefix alone is sufficient to sever the causal generation link, and (ii) flattening only the top-$k$ logits is adequate to maximize uncertainty in the critical subspace. These findings allow PALU to avoid redundant optimization across the full vocabulary and parameter space while minimizing collateral damage to general model performance. Extensive experiments validate that PALU achieves superior forgetting efficacy and utility preservation compared to state-of-the-art baselines.
Similar Papers
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Machine Learning (CS)
Removes bad info from AI, making it safer.
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
Computation and Language
Removes bad info from AI, keeps good info.
Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning
Computation and Language
Removes bad info without forgetting good knowledge.