Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
By: Zhifeng Wang, Longlong Li, Chunyan Zeng
Potential Business Impact:
Helps computers learn faster and better.
Within the current sphere of deep learning research, despite the extensive application of optimization algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam), there remains a pronounced inadequacy in their capability to address fluctuations in learning efficiency, meet the demands of complex models, and tackle non-convex optimization issues. These challenges primarily arise from the algorithms' limitations in handling complex data structures and models, for instance, difficulties in selecting an appropriate learning rate, avoiding local optima, and navigating through high-dimensional spaces. To address these issues, this paper introduces a novel optimization algorithm named DWMGrad. This algorithm, building on the foundations of traditional methods, incorporates a dynamic guidance mechanism reliant on historical data to dynamically update momentum and learning rates. This allows the optimizer to flexibly adjust its reliance on historical information, adapting to various training scenarios. This strategy not only enables the optimizer to better adapt to changing environments and task complexities but also, as validated through extensive experimentation, demonstrates DWMGrad's ability to achieve faster convergence rates and higher accuracies under a multitude of scenarios.
Similar Papers
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
Machine Learning (CS)
Makes computer learning faster by changing memory.
Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
Machine Learning (CS)
Makes computer learning work better, even with bad settings.
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Machine Learning (Stat)
Improves computer learning by making it more stable.