Shrinkage to Infinity: Reducing Test Error by Inflating the Minimum Norm Interpolator in Linear Models
By: Jake Freeman
Potential Business Impact:
Improves computer learning with messy data.
Hastie et al. (2022) found that ridge regularization is essential in high dimensional linear regression $y=\beta^Tx + \epsilon$ with isotropic co-variates $x\in \mathbb{R}^d$ and $n$ samples at fixed $d/n$. However, Hastie et al. (2022) also notes that when the co-variates are anisotropic and $\beta$ is aligned with the top eigenvalues of population covariance, the "situation is qualitatively different." In the present article, we make precise this observation for linear regression with highly anisotropic covariances and diverging $d/n$. We find that simply scaling up (or inflating) the minimum $\ell_2$ norm interpolator by a constant greater than one can improve the generalization error. This is in sharp contrast to traditional regularization/shrinkage prescriptions. Moreover, we use a data-splitting technique to produce consistent estimators that achieve generalization error comparable to that of the optimally inflated minimum-norm interpolator. Our proof relies on apparently novel matching upper and lower bounds for expectations of Gaussian random projections for a general class of anisotropic covariance matrices when $d/n\to \infty$.
Similar Papers
Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
Machine Learning (Stat)
Keeps AI from forgetting what it learned.
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
Machine Learning (Stat)
Finds when math models make wrong guesses.
Estimation and inference in error-in-operator model
Statistics Theory
Fixes computer guesses when data is messy.