Science Cast

When Are Concepts Erased From Diffusion Models?

librarianMay 24, 2025 9:01am

Views (67)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

When Are Concepts Erased From Diffusion Models?

arXivPDFMay 22, 2025 12:00am

Authors

Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen

Abstract

Concept erasure, the ability to selectively prevent a model from generating specific concepts, has attracted growing interest, with various approaches emerging to address the challenge. However, it remains unclear how thoroughly these methods erase the target concept. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) reducing the likelihood of generating the target concept, and (ii) interfering with the model's internal guidance mechanisms. To thoroughly assess whether a concept has been truly erased from the model, we introduce a suite of independent evaluations. Our evaluation framework includes adversarial attacks, novel probing techniques, and analysis of the model's alternative generations in place of the erased concept. Our results shed light on the tension between minimizing side effects and maintaining robustness to adversarial prompts. Broadly, our work underlines the importance of comprehensive evaluation for erasure in diffusion models.

TwitterandLinkedIn

0 comments

Add comment

When Are Concepts Erased From Diffusion Models?

When Are Concepts Erased From Diffusion Models?

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments