Fully Adaptive Stepsizes: Which System Benefit More -- Centralized or Decentralized?
By: Diyako Ghaderyan, Stefan Werner
Potential Business Impact:
Helps computers learn faster by adjusting their own learning speed.
In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT generates a sequence of iterates that converges to the optimal consensus solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.
Similar Papers
Distributed Optimization and Learning for Automated Stepsize Selection with Finite Time Coordination
Systems and Control
Makes computers learn faster and more accurately.
Adaptive control mechanisms in gradient descent algorithms
Optimization and Control
Makes computer learning faster and more accurate.
Adaptive Stepsizing for Stochastic Gradient Langevin Dynamics in Bayesian Neural Networks
Machine Learning (CS)
Makes computer learning more accurate and stable.