A multilevel approach to accelerate the training of Transformers
By: Guillaume Lauga , Maël Chaumette , Edgar Desainte-Maréville and more
Potential Business Impact:
Makes computer learning models train much faster.
In this article, we investigate the potential of multilevel approaches to accelerate the training of transformer architectures. Using an ordinary differential equation (ODE) interpretation of these architectures, we propose an appropriate way of varying the discretization of these ODE Transformers in order to accelerate the training. We validate our approach experimentally by a comparison with the standard training procedure.
Similar Papers
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Machine Learning (CS)
Makes AI understand itself better.
High-order expansion of Neural Ordinary Differential Equations flows
Optimization and Control
Explains how smart computer models make decisions.
Learning the Simplest Neural ODE
Machine Learning (Stat)
Makes it easier to teach computers about changing things.