Provably Convergent Decentralized Optimization over Directed Graphs under Generalized Smoothness
By: Yanan Bo, Yongqiang Wang
Potential Business Impact:
Helps computers learn faster with messy data.
Decentralized optimization has become a fundamental tool for large-scale learning systems; however, most existing methods rely on the classical Lipschitz smoothness assumption, which is often violated in problems with rapidly varying gradients. Motivated by this limitation, we study decentralized optimization under the generalized $(L_0, L_1)$-smoothness framework, in which the Hessian norm is allowed to grow linearly with the gradient norm, thereby accommodating rapidly varying gradients beyond classical Lipschitz smoothness. We integrate gradient-tracking techniques with gradient clipping and carefully design the clipping threshold to ensure accurate convergence over directed communication graphs under generalized smoothness. In contrast to existing distributed optimization results under generalized smoothness that require a bounded gradient dissimilarity assumption, our results remain valid even when the gradient dissimilarity is unbounded, making the proposed framework more applicable to realistic heterogeneous data environments. We validate our approach via numerical experiments on standard benchmark datasets, including LIBSVM and CIFAR-10, using regularized logistic regression and convolutional neural networks, demonstrating superior stability and faster convergence over existing methods.
Similar Papers
Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
Optimization and Control
Helps many computers work together to solve problems.
Mirror Descent Under Generalized Smoothness
Optimization and Control
Makes computer learning faster on tricky problems.
Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness
Optimization and Control
Makes computer math problems solve faster.