Accelerating Decentralized Optimization via Overlapping Local Steps
By: Yijie Zhou, Shi Pu
Potential Business Impact:
Speeds up computer learning by sharing work faster.
Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication bottlenecks due to frequent synchronization between nodes. We present Overlapping Local Decentralized SGD (OLDSGD), a novel approach to accelerate decentralized training by computation-communication overlapping, significantly reducing network idle time. With a deliberately designed update, OLDSGD preserves the same average update as Local SGD while avoiding communication-induced stalls. Theoretically, we establish non-asymptotic convergence rates for smooth non-convex objectives, showing that OLDSGD retains the same iteration complexity as standard Local Decentralized SGD while improving per-iteration runtime. Empirical results demonstrate OLDSGD's consistent improvements in wall-clock time convergence under different levels of communication delays. With minimal modifications to existing frameworks, OLDSGD offers a practical solution for faster decentralized learning without sacrificing theoretical guarantees.
Similar Papers
Enhancing Parallelism in Decentralized Stochastic Convex Optimization
Machine Learning (CS)
Lets more computers learn together faster.
Provable Acceleration of Distributed Optimization with Local Updates
Systems and Control
Makes computer teams solve problems faster.
Cooperative SGD with Dynamic Mixing Matrices
Machine Learning (CS)
Makes computer learning faster and better.