Mass Concept Erasure in Diffusion Models with Concept Hierarchy
By: Jiahang Tu , Ye Li , Yiming Wu and more
Potential Business Impact:
Stops AI from making bad pictures of many things.
The success of diffusion models has raised concerns about the generation of unsafe or harmful content, prompting concept erasure approaches that fine-tune modules to suppress specific concepts while preserving general generative capabilities. However, as the number of erased concepts grows, these methods often become inefficient and ineffective, since each concept requires a separate set of fine-tuned parameters and may degrade the overall generation quality. In this work, we propose a supertype-subtype concept hierarchy that organizes erased concepts into a parent-child structure. Each erased concept is treated as a child node, and semantically related concepts (e.g., macaw, and bald eagle) are grouped under a shared parent node, referred to as a supertype concept (e.g., bird). Rather than erasing concepts individually, we introduce an effective and efficient group-wise suppression method, where semantically similar concepts are grouped and erased jointly by sharing a single set of learnable parameters. During the erasure phase, standard diffusion regularization is applied to preserve denoising process in unmasked regions. To mitigate the degradation of supertype generation caused by excessive erasure of semantically related subtypes, we propose a novel method called Supertype-Preserving Low-Rank Adaptation (SuPLoRA), which encodes the supertype concept information in the frozen down-projection matrix and updates only the up-projection matrix during erasure. Theoretical analysis demonstrates the effectiveness of SuPLoRA in mitigating generation performance degradation. We construct a more challenging benchmark that requires simultaneous erasure of concepts across diverse domains, including celebrities, objects, and pornographic content.
Similar Papers
Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression
CV and Pattern Recognition
Stops AI from making bad or copied pictures.
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
Machine Learning (CS)
Makes AI forget bad ideas without forgetting good ones.
Rethinking Robust Adversarial Concept Erasure in Diffusion Models
CV and Pattern Recognition
Removes bad ideas from AI art generators.