Score: 1

Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding

Published: November 7, 2025 | arXiv ID: 2511.04934v1

By: Hadi Reisizadeh , Jiajun Ruan , Yiwei Chen and more

Potential Business Impact:

Makes AI forget private information reliably.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Unlearning in large language models (LLMs) is critical for regulatory compliance and for building ethical generative AI systems that avoid producing private, toxic, illegal, or copyrighted content. Despite rapid progress, in this work we show that \textit{almost all} existing unlearning methods fail to achieve true forgetting in practice. Specifically, while evaluations of these `unlearned' models under deterministic (greedy) decoding often suggest successful knowledge removal using standard benchmarks (as has been done in the literature), we show that sensitive information reliably resurfaces when models are sampled with standard probabilistic decoding. To rigorously capture this vulnerability, we introduce \texttt{leak@$k$}, a new meta-evaluation metric that quantifies the likelihood of forgotten knowledge reappearing when generating $k$ samples from the model under realistic decoding strategies. Using three widely adopted benchmarks, TOFU, MUSE, and WMDP, we conduct the first large-scale, systematic study of unlearning reliability using our newly defined \texttt{leak@$k$} metric. Our findings demonstrate that knowledge leakage persists across methods and tasks, underscoring that current state-of-the-art unlearning techniques provide only limited forgetting and highlighting the urgent need for more robust approaches to LLM unlearning.

Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting

Machine Learning (CS)

Lets AI forget private information when asked.

13 Nov 2025 0

92%

LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data

Machine Learning (CS)

Cleans AI without needing perfect instructions.

10 Oct 2025 2

92%

Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs

Machine Learning (CS)

Removes bad info from AI, making it safer.

2 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

25 pages

Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding

Makes AI forget private information reliably.

Technical Abstract

Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting

LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data

Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs