The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights
By: Gabriel Clara, Yazan Mash'al
Potential Business Impact:
Makes computer learning faster and more accurate.
We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descent, importance sampling, but also extends to weighting distributions with arbitrary continuous values, thereby providing a unified framework to analyze the impact of various kinds of noise on the training trajectory. We characterize the implicit regularization induced through the random weighting, connect it with weighted linear regression, and derive non-asymptotic bounds for convergence in first and second moments. Leveraging geometric moment contraction, we also investigate the stationary distribution induced by the added noise. Based on these results, we discuss how specific choices of weighting distribution influence both the underlying optimization problem and statistical properties of the resulting estimator, as well as some examples for which weightings that lead to fast convergence cause bad statistical performance.
Similar Papers
Online Linear Regression with Paid Stochastic Features
Machine Learning (CS)
Learns better by choosing how much to pay for cleaner data.
The Power of Random Features and the Limits of Distribution-Free Gradient Descent
Machine Learning (CS)
Shows why computers need data rules to learn.
Stochastic Gradients under Nuisances
Machine Learning (Stat)
Teaches computers to learn even with tricky, hidden info.