Reconciling Communication Compression and Byzantine-Robustness in Distributed Learning
By: Diksha Gupta , Nirupam Gupta , Chuan Xu and more
Potential Business Impact:
Makes computers learn faster with less data sent.
Distributed learning (DL) enables scalable model training over decentralized data, but remains challenged by Byzantine faults and high communication costs. While both issues have been studied extensively in isolation, their interaction is less explored. Prior work shows that naively combining communication compression with Byzantine-robust aggregation degrades resilience to faulty nodes (or workers). The state-of-the-art algorithm, namely Byz-DASHA-PAGE [29], makes use of the momentum variance reduction scheme to mitigate the detrimental impact of compression noise on Byzantine-robustness. We propose a new algorithm, named RoSDHB, that integrates the classic Polyak's momentum with a new coordinated compression mechanism. We show that RoSDHB performs comparably to Byz-DASHA-PAGE under the standard (G, B)-gradient dissimilarity heterogeneity model, while it relies on fewer assumptions. In particular, we only assume Lipschitz smoothness of the average loss function of the honest workers, in contrast to [29]that additionally assumes a special smoothness of bounded global Hessian variance. Empirical results on benchmark image classification task show that RoSDHB achieves strong robustness with significant communication savings.
Similar Papers
Coded Robust Aggregation for Distributed Learning under Byzantine Attacks
Machine Learning (CS)
Protects computer learning from bad data.
Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation
Machine Learning (CS)
Keeps AI learning safe from bad data.
Fed-DPRoC:Communication-Efficient Differentially Private and Robust Federated Learning
Machine Learning (CS)
Keeps private data safe during learning.