Score: 1

SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models

Published: April 2, 2025 | arXiv ID: 2504.02883v1

By: Anil Ramakrishna , Yixin Wan , Xiaomeng Jin and more

Potential Business Impact:

Removes private info from AI writing.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We introduce SemEval-2025 Task 4: unlearning sensitive content from Large Language Models (LLMs). The task features 3 subtasks for LLM unlearning spanning different use cases: (1) unlearn long form synthetic creative documents spanning different genres; (2) unlearn short form synthetic biographies containing personally identifiable information (PII), including fake names, phone number, SSN, email and home addresses, and (3) unlearn real documents sampled from the target model's training dataset. We received over 100 submissions from over 30 institutions and we summarize the key techniques and lessons in this paper.

Repos / Data Links

Page Count
13 pages

Category
Computer Science:
Computation and Language