Optimizing ML Training with Metagradient Descent
By: Logan Engstrom , Andrew Ilyas , Benjamin Chen and more
Potential Business Impact:
Finds best ways to teach computers faster.
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.
Similar Papers
Scalable Meta-Learning via Mixed-Mode Differentiation
Machine Learning (CS)
Makes smart computer learning faster and use less memory.
Scaling of hardware-compatible perturbative training algorithms
Machine Learning (CS)
Trains big computer brains faster, even on new chips.
Efficient End-to-End Learning for Decision-Making: A Meta-Optimization Approach
Machine Learning (CS)
Teaches computers to solve hard problems faster.