LLM Unlearning Without an Expert Curated Dataset
By: Xiaoyuan Zhu , Muru Zhang , Ollie Liu and more
Potential Business Impact:
Teaches computers to forget bad or secret stuff.
Modern large language models often encode sensitive, harmful, or copyrighted knowledge, raising the need for post-hoc unlearning-the ability to remove specific domains of knowledge from a model without full retraining. A major bottleneck in current unlearning pipelines is constructing effective forget sets-datasets that approximate the target domain and guide the model to forget it. In this work, we introduce a scalable, automated approach to generate high-quality forget sets using language models themselves. Our method synthesizes textbook-style data through a structured prompting pipeline, requiring only a domain name as input. Through experiments on unlearning biosecurity, cybersecurity, and Harry Potter novels, we show that our synthetic datasets consistently outperform the baseline synthetic alternatives and are comparable to the expert-curated ones. Additionally, ablation studies reveal that the multi-step generation pipeline significantly boosts data diversity, which in turn improves unlearning utility. Overall, our findings suggest that synthetic datasets offer a promising path toward practical, scalable unlearning for a wide range of emerging domains without the need for manual intervention. We release our code and dataset at https://github.com/xyzhu123/Synthetic_Textbook.
Similar Papers
A Survey on Unlearning in Large Language Models
Computation and Language
Lets AI forget private or bad information.
From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning
Computation and Language
Teaches computers to forget specific information.
LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data
Machine Learning (CS)
Cleans AI without needing perfect instructions.