Score: 0

Self-Consuming Generative Models with Adversarially Curated Data

Published: May 14, 2025 | arXiv ID: 2505.09768v1

By: Xiukun Wei, Xueru Zhang

Potential Business Impact:

Makes AI models learn wrong things from bad data.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Recent advances in generative models have made it increasingly difficult to distinguish real data from model-generated synthetic data. Using synthetic data for successive training of future model generations creates "self-consuming loops", which may lead to model collapse or training instability. Furthermore, synthetic data is often subject to human feedback and curated by users based on their preferences. Ferbach et al. (2024) recently showed that when data is curated according to user preferences, the self-consuming retraining loop drives the model to converge toward a distribution that optimizes those preferences. However, in practice, data curation is often noisy or adversarially manipulated. For example, competing platforms may recruit malicious users to adversarially curate data and disrupt rival models. In this paper, we study how generative models evolve under self-consuming retraining loops with noisy and adversarially curated data. We theoretically analyze the impact of such noisy data curation on generative models and identify conditions for the robustness of the retraining process. Building on this analysis, we design attack algorithms for competitive adversarial scenarios, where a platform with a limited budget employs malicious users to misalign a rival's model from actual user preferences. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithms.

Convergence and Stability Analysis of Self-Consuming Generative Models with Heterogeneous Human Curation

Machine Learning (Stat)

Teaches computers to learn from their own mistakes.

12 Nov 2025 0

89%

Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop

Artificial Intelligence

Fixes AI bias from its own mistakes.

8 Jan 2026 1

88%

A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops

Machine Learning (CS)

Teaches computers to learn from their own mistakes.

26 Feb 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

22 pages

Self-Consuming Generative Models with Adversarially Curated Data

Makes AI models learn wrong things from bad data.

Technical Abstract

Convergence and Stability Analysis of Self-Consuming Generative Models with Heterogeneous Human Curation

Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop

A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops