Score: 0

Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression

Published: September 26, 2025 | arXiv ID: 2509.22341v1

By: Anvit Garg, Sohom Bhattacharya, Pragya Sur

Potential Business Impact:

Keeps AI from forgetting what it learned.

Business Areas:

A/B Testing Data and Analytics

Model collapse occurs when generative models degrade after repeatedly training on their own synthetic outputs. We study this effect in overparameterized linear regression in a setting where each iteration mixes fresh real labels with synthetic labels drawn from the model fitted in the previous iteration. We derive precise generalization error formulae for minimum-$\ell_2$-norm interpolation and ridge regression under this iterative scheme. Our analysis reveals intriguing properties of the optimal mixing weight that minimizes long-term prediction error and provably prevents model collapse. For instance, in the case of min-$\ell_2$-norm interpolation, we establish that the optimal real-data proportion converges to the reciprocal of the golden ratio for fairly general classes of covariate distributions. Previously, this property was known only for ordinary least squares, and additionally in low dimensions. For ridge regression, we further analyze two popular model classes -- the random-effects model and the spiked covariance model -- demonstrating how spectral geometry governs optimal weighting. In both cases, as well as for isotropic features, we uncover that the optimal mixing ratio should be at least one-half, reflecting the necessity of favoring real-data over synthetic. We validate our theoretical results with extensive simulations.

Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

Machine Learning (Stat)

Finds when math models make wrong guesses.

1 Oct 2025 0

87%

Shrinkage to Infinity: Reducing Test Error by Inflating the Minimum Norm Interpolator in Linear Models

Statistics Theory

Improves computer learning with messy data.

22 Oct 2025 1

86%

A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective

Machine Learning (CS)

Stops AI from copying itself when making new pictures.

20 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

28 pages

Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression

Keeps AI from forgetting what it learned.

Technical Abstract

Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

Shrinkage to Infinity: Reducing Test Error by Inflating the Minimum Norm Interpolator in Linear Models

A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective