Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises
By: Xinwen Zhang, Yihan Zhang, Hongchang Gao
Potential Business Impact:
Teaches computers to learn better with messy data.
Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noises. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noises for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noises. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noises.
Similar Papers
Stochastic Bilevel Optimization with Heavy-Tailed Noise
Machine Learning (CS)
Teaches computers to learn better from messy data.
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
Optimization and Control
Helps computers learn better with messy data.
Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization
Machine Learning (CS)
Helps computers solve tough problems without knowing how hard they are.