Stochastic Adaptive Gradient Descent Without Descent
By: Jean-François Aujol, Jérémie Bigot, Camille Castera
Potential Business Impact:
Makes computer learning faster without needing extra settings.
We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.
Similar Papers
Adaptive Conditional Gradient Descent
Optimization and Control
Makes computer learning faster and better.
Gradient Descent with Provably Tuned Learning-rate Schedules
Machine Learning (CS)
Teaches computers to learn better, even when tricky.
Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement
Machine Learning (CS)
Makes computer learning faster and more reliable.