FAME: Fictional Actors for Multilingual Erasure
By: Claudio Savelli , Moreno La Quatra , Alkis Koudounas and more
Potential Business Impact:
Removes specific facts from AI without retraining.
LLMs trained on web-scale data raise concerns about privacy and the right to be forgotten. To address these issues, Machine Unlearning provides techniques to remove specific information from trained models without retraining from scratch. However, existing benchmarks for evaluating unlearning in LLMs face two major limitations: they focus only on English and support only entity-level forgetting (removing all information about a person). We introduce FAME (Fictional Actors for Multilingual Erasure), a synthetic benchmark for evaluating Machine Unlearning across five languages: English, French, German, Italian, and Spanish. FAME contains 1,000 fictional actor biographies and 20,000 question-answer pairs. Each biography includes information on 20 topics organized into structured categories (biography, career, achievements, personal information). This design enables both entity-level unlearning (i.e., forgetting entire identities) and instance-level unlearning (i.e., forgetting specific facts while retaining others). We provide two dataset splits to support these two different unlearning scenarios and enable systematic comparison of unlearning techniques across languages. Since FAME uses entirely fictional data, it ensures that the information was never encountered during model pretraining, allowing for a controlled evaluation of unlearning methods.
Similar Papers
What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests
Computation and Language
Finds and removes personal facts from AI.
A Survey on Unlearning in Large Language Models
Computation and Language
Lets AI forget private or bad information.
Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting
Machine Learning (CS)
Lets AI forget private information when asked.