Score: 1

Optimizing ML Training with Metagradient Descent

Published: March 17, 2025 | arXiv ID: 2503.13751v1

By: Logan Engstrom , Andrew Ilyas , Benjamin Chen and more

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Finds best ways to teach computers faster.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.