Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
By: Tianqi Qiao, Marie Maros
Potential Business Impact:
Solves hard math problems faster with fewer steps.
We propose and study Sparse Polyak, a variant of Polyak's adaptive step size, designed to solve high-dimensional statistical estimation problems where the problem dimension is allowed to grow much faster than the sample size. In such settings, the standard Polyak step size performs poorly, requiring an increasing number of iterations to achieve optimal statistical precision-even when, the problem remains well conditioned and/or the achievable precision itself does not degrade with problem size. We trace this limitation to a mismatch in how smoothness is measured: in high dimensions, it is no longer effective to estimate the Lipschitz smoothness constant. Instead, it is more appropriate to estimate the smoothness restricted to specific directions relevant to the problem (restricted Lipschitz smoothness constant). Sparse Polyak overcomes this issue by modifying the step size to estimate the restricted Lipschitz smoothness constant. We support our approach with both theoretical analysis and numerical experiments, demonstrating its improved performance.
Similar Papers
Sparse Polyak with optimal thresholding operators for high-dimensional M-estimation
Machine Learning (Stat)
Finds hidden patterns in big data accurately.
Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation
Machine Learning (CS)
Makes AI learn faster and smaller.
Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients
Optimization and Control
Makes computer learning faster and better.