Score: 0

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models

Published: August 27, 2025 | arXiv ID: 2509.00083v1

By: Laksh Patel, Neel Shanbhag

Potential Business Impact:

Makes AI forget private data it learned.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Modern generative models risk overfitting and unintentionally memorizing rare training examples, which can be extracted by adversaries or inflate benchmark performance. We propose Generative Data Cartography (GenDataCarto), a data-centric framework that assigns each pretraining sample a difficulty score (early-epoch loss) and a memorization score (frequency of ``forget events''), then partitions examples into four quadrants to guide targeted pruning and up-/down-weighting. We prove that our memorization score lower-bounds classical influence under smoothness assumptions and that down-weighting high-memorization hotspots provably decreases the generalization gap via uniform stability bounds. Empirically, GenDataCarto reduces synthetic canary extraction success by over 40\% at just 10\% data pruning, while increasing validation perplexity by less than 0.5\%. These results demonstrate that principled data interventions can dramatically mitigate leakage with minimal cost to generative performance.

Memorization in 3D Shape Generation: An Empirical Study

CV and Pattern Recognition

Finds if AI copies 3D shapes it learned.

29 Dec 2025 1

87%

Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized

CV and Pattern Recognition

Stops AI from copying art it learned from.

10 Dec 2025 0

87%

Beyond Memorization: Gradient Projection Enables Selective Learning in Diffusion Models

Machine Learning (CS)

Stops AI from copying private images.

12 Dec 2025 0

View PDF Login to Bookmark

Page Count

6 pages

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models

Makes AI forget private data it learned.

Technical Abstract

Memorization in 3D Shape Generation: An Empirical Study

Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized

Beyond Memorization: Gradient Projection Enables Selective Learning in Diffusion Models