Estimating a regression function under possible heteroscedastic and heavy-tailed errors. Application to shape-restricted regression
By: Yannick Baraud, Guillaume Maillard
Potential Business Impact:
Improves computer predictions with messy data.
We consider a regression framework where the design points are deterministic and the errors possibly non-i.i.d. and heavy-tailed (with a moment of order $p$ in $[1,2]$). Given a class of candidate regression functions, we propose a surrogate for the classical least squares estimator (LSE). For this new estimator, we establish a nonasymptotic risk bound with respect to the absolute loss which takes the form of an oracle type inequality. This inequality shows that our estimator possesses natural adaptation properties with respect to some elements of the class. When this class consists of monotone functions or convex functions on an interval, these adaptation properties are similar to those established in the literature for the LSE. However, unlike the LSE, we prove that our estimator remains stable with respect to a possible heteroscedasticity of the errors and may even converge at a parametric rate (up to a logarithmic factor) when the LSE is not even consistent. We illustrate the performance of this new estimator over classes of regression functions that satisfy a shape constraint: piecewise monotone, piecewise convex/concave, among other examples. The paper also contains some approximation results by splines with degrees in $\{0,1\}$ and VC bounds for the dimensions of classes of level sets. These results may be of independent interest.
Similar Papers
Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise
Machine Learning (CS)
Makes computers learn from messy, unreliable data.
Heavy Lasso: sparse penalized regression under heavy-tailed noise via data-augmented soft-thresholding
Methodology
Makes computer models better with messy data.
Heteroscedastic Growth Curve Modeling with Shape-Restricted Splines
Methodology
Makes growth predictions more accurate by fixing errors.