Score: 0

Generalization of Diffusion Models Arises with a Balanced Representation Space

Published: December 24, 2025 | arXiv ID: 2512.20963v1

By: Zekai Zhang , Xiao Li , Xiang Li and more

Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning. By investigating a two-layer ReLU denoising autoencoder (DAE), we prove that (i) memorization corresponds to the model storing raw training samples in the learned weights for encoding and decoding, yielding localized "spiky" representations, whereas (ii) generalization arises when the model captures local data statistics, producing "balanced" representations. Furthermore, we validate these theoretical findings on real-world unconditional and text-to-image diffusion models, demonstrating that the same representation structures emerge in deep generative models with significant practical implications. Building on these insights, we propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering. Together, our results highlight that learning good representations is central to novel and meaningful generative modeling.

On the Edge of Memorization in Diffusion Models

Machine Learning (CS)

Helps AI learn without copying its training pictures.

25 Aug 2025 2

90%

Provable Separations between Memorization and Generalization in Diffusion Models

Machine Learning (Stat)

Stops AI from copying its training pictures.

5 Nov 2025 0

90%

Provable Separations between Memorization and Generalization in Diffusion Models

Machine Learning (Stat)

Stops AI from copying its training pictures.

5 Nov 2025 0

View PDF Login to Bookmark

Generalization of Diffusion Models Arises with a Balanced Representation Space

Technical Abstract

On the Edge of Memorization in Diffusion Models

Provable Separations between Memorization and Generalization in Diffusion Models

Provable Separations between Memorization and Generalization in Diffusion Models