Score: 1

Scalable Meta-Learning via Mixed-Mode Differentiation

Published: May 1, 2025 | arXiv ID: 2505.00793v2

By: Iurii Kemaev , Dan A Calian , Luisa M Zintgraf and more

BigTech Affiliations: Google

Potential Business Impact:

Makes smart computer learning faster and use less memory.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG -- a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.