Statistical Inference for Linear Functionals of Online Least-squares SGD when $t \gtrsim d^{1+δ}$
By: Bhavya Agrawalla, Krishnakumar Balasubramanian, Promit Ghosal
Potential Business Impact:
Makes computer learning more accurate with less data.
Stochastic Gradient Descent (SGD) has become a cornerstone method in modern data science. However, deploying SGD in high-stakes applications necessitates rigorous quantification of its inherent uncertainty. In this work, we establish \emph{non-asymptotic Berry--Esseen bounds} for linear functionals of online least-squares SGD, thereby providing a Gaussian Central Limit Theorem (CLT) in a \emph{growing-dimensional regime}. Existing approaches to high-dimensional inference for projection parameters, such as~\cite{chang2023inference}, rely on inverting empirical covariance matrices and require at least $t \gtrsim d^{3/2}$ iterations to achieve finite-sample Berry--Esseen guarantees, rendering them computationally expensive and restrictive in the allowable dimensional scaling. In contrast, we show that a CLT holds for SGD iterates when the number of iterations grows as $t \gtrsim d^{1+\delta}$ for any $\delta > 0$, significantly extending the dimensional regime permitted by prior works while improving computational efficiency. The proposed online SGD-based procedure operates in $\mathcal{O}(td)$ time and requires only $\mathcal{O}(d)$ memory, in contrast to the $\mathcal{O}(td^2 + d^3)$ runtime of covariance-inversion methods. To render the theory practically applicable, we further develop an \emph{online variance estimator} for the asymptotic variance appearing in the CLT and establish \emph{high-probability deviation bounds} for this estimator. Collectively, these results yield the first fully online and data-driven framework for constructing confidence intervals for SGD iterates in the near-optimal scaling regime $t \gtrsim d^{1+\delta}$.
Similar Papers
Online Inference for Quantiles by Constant Learning-Rate Stochastic Gradient Descent
Machine Learning (Stat)
Makes computer learning more accurate and reliable.
Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation
Machine Learning (Stat)
Makes computer learning more accurate and faster.
Sharp Gaussian approximations for Decentralized Federated Learning
Machine Learning (Stat)
Detects computer attacks by watching how they learn.