Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
By: Jiaqi Li , Zhipeng Lou , Johannes Schmidt-Hieber and more
Potential Business Impact:
Makes computer learning work better in complex situations.
Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates. Building on this, we derive the $q$-th moment convergence of SGD and ASGD for any $q\ge2$ in general $\ell^s$-norms, and, in particular, the $\ell^{\infty}$-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.
Similar Papers
Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Optimization and Control
Makes AI learn faster without needing extra tricks.
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Machine Learning (Stat)
Improves computer learning by making it more stable.
Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods
Machine Learning (CS)
Makes private AI learn better and more reliably.