EMMA: Concept Erasure Benchmark with Comprehensive Semantic Metrics and Diverse Categories
By: Lu Wei, Yuta Nakashima, Noa Garcia
The widespread adoption of text-to-image (T2I) generation has raised concerns about privacy, bias, and copyright violations. Concept erasure techniques offer a promising solution by selectively removing undesired concepts from pre-trained models without requiring full retraining. However, these methods are often evaluated on a limited set of concepts, relying on overly simplistic and direct prompts. To test the boundaries of concept erasure techniques, and assess whether they truly remove targeted concepts from model representations, we introduce EMMA, a benchmark that evaluates five key dimensions of concept erasure over 12 metrics. EMMA goes beyond standard metrics like image quality and time efficiency, testing robustness under challenging conditions, including indirect descriptions, visually similar non-target concepts, and potential gender and ethnicity bias, providing a socially aware analysis of method behavior. Using EMMA, we analyze five concept erasure methods across five domains (objects, celebrities, art styles, NSFW, and copyright). Our results show that existing methods struggle with implicit prompts (i.e., generating the erased concept when it is indirectly referenced) and visually similar non-target concepts (i.e., failing to generate non-targeted concepts resembling the erased one), while some amplify gender and ethnicity bias compared to the original model.
Similar Papers
Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts
CV and Pattern Recognition
Fixes AI art when it messes up other things.
EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques
CV and Pattern Recognition
Fixes AI art to remove bad ideas.
Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models
CV and Pattern Recognition
Removes bad ideas from AI art generators.