Scalable Meta-Learning via Mixed-Mode Differentiation
By: Iurii Kemaev , Dan A Calian , Luisa M Zintgraf and more
Potential Business Impact:
Makes smart computer learning faster and use less memory.
Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG -- a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.
Similar Papers
Optimizing ML Training with Metagradient Descent
Machine Learning (Stat)
Finds best ways to teach computers faster.
Safe Gradient Flow for Bilevel Optimization
Optimization and Control
Helps make smart decisions when one choice affects another.
Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond
Machine Learning (CS)
Teaches AI to balance many goals at once.