Communication Compression for Distributed Learning with Aggregate and Server-Guided Feedback
By: Tomas Ortega , Chun-Yin Huang , Xiaoxiao Li and more
Distributed learning, particularly Federated Learning (FL), faces a significant bottleneck in the communication cost, particularly the uplink transmission of client-to-server updates, which is often constrained by asymmetric bandwidth limits at the edge. Biased compression techniques are effective in practice, but require error feedback mechanisms to provide theoretical guarantees and to ensure convergence when compression is aggressive. Standard error feedback, however, relies on client-specific control variates, which violates user privacy and is incompatible with stateless clients common in large-scale FL. This paper proposes two novel frameworks that enable biased compression without client-side state or control variates. The first, Compressed Aggregate Feedback (CAFe), uses the globally aggregated update from the previous round as a shared control variate for all clients. The second, Server-Guided Compressed Aggregate Feedback (CAFe-S), extends this idea to scenarios where the server possesses a small private dataset; it generates a server-guided candidate update to be used as a more accurate predictor. We consider Distributed Gradient Descent (DGD) as a representative algorithm and analytically prove CAFe's superiority to Distributed Compressed Gradient Descent (DCGD) with biased compression in the non-convex regime with bounded gradient dissimilarity. We further prove that CAFe-S converges to a stationary point, with a rate that improves as the server's data become more representative. Experimental results in FL scenarios validate the superiority of our approaches over existing compression schemes.
Similar Papers
Gradient Compression and Correlation Driven Federated Learning for Wireless Traffic Prediction
Distributed, Parallel, and Cluster Computing
Makes phone networks send less data.
Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization
Machine Learning (CS)
Makes phones learn together faster, without sharing private data.
Personalized Federated Learning with Bidirectional Communication Compression via One-Bit Random Sketching
Machine Learning (CS)
Makes phones learn together without sharing private data.