Tight Analysis of Decentralized SGD: A Markov Chain Perspective
By: Lucas Versini, Paul Mangold, Aymeric Dieuleveut
Potential Business Impact:
Makes computer learning faster with more helpers.
We propose a novel analysis of the Decentralized Stochastic Gradient Descent (DSGD) algorithm with constant step size, interpreting the iterates of the algorithm as a Markov chain. We show that DSGD converges to a stationary distribution, with its bias, to first order, decomposable into two components: one due to decentralization (growing with the graph's spectral gap and clients' heterogeneity) and one due to stochasticity. Remarkably, the variance of local parameters is, at the first-order, inversely proportional to the number of clients, regardless of the network topology and even when clients' iterates are not averaged at the end. As a consequence of our analysis, we obtain non-asymptotic convergence bounds for clients' local iterates, confirming that DSGD has linear speed-up in the number of clients, and that the network topology only impacts higher-order terms.
Similar Papers
Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods
Machine Learning (CS)
Makes private AI learn better and more reliably.
Cooperative SGD with Dynamic Mixing Matrices
Machine Learning (CS)
Makes computer learning faster and better.
Cooperative SGD with Dynamic Mixing Matrices
Machine Learning (CS)
Makes computer learning faster by changing how it learns.