Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement
By: Stefan Perko
Potential Business Impact:
Makes computer learning faster and more reliable.
Gradient optimization algorithms using epochs, that is those based on stochastic gradient descent without replacement (SGDo), are predominantly used to train machine learning models in practice. However, the mathematical theory of SGDo and related algorithms remain underexplored compared to their "with replacement" and "one-pass" counterparts. In this article, we propose a stochastic, continuous-time approximation to SGDo with additive noise based on a Young differential equation driven by a stochastic process we call an "epoched Brownian motion". We show its usefulness by proving the almost sure convergence of the continuous-time approximation for strongly convex objectives and learning rate schedules of the form $u_t = \frac{1}{(1+t)^β}, β\in (0,1)$. Moreover, we compute an upper bound on the asymptotic rate of almost sure convergence, which is as good or better than previous results for SGDo.
Similar Papers
Modified Equations for Stochastic Optimization
Probability
Makes computer learning faster and more accurate.
Revisiting Stochastic Approximation and Stochastic Gradient Descent
Optimization and Control
Helps computers learn better with messy data.
Stochastic Adaptive Gradient Descent Without Descent
Machine Learning (CS)
Makes computer learning faster without needing extra settings.