Gradient flow for deep equilibrium single-index models
By: Sanjit Dandapanthula, Aaditya Ramdas
Potential Business Impact:
Makes super-deep computer brains learn faster.
Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training DEQs remains an area of active research. In this work, we rigorously study the gradient descent dynamics for DEQs in the simple setting of linear models and single-index models, filling several gaps in the literature. We prove a conservation law for linear DEQs which implies that the parameters remain trapped on spheres during training and use this property to show that gradient flow remains well-conditioned for all time. We then prove linear convergence of gradient descent to a global minimizer for linear DEQs and deep equilibrium single-index models under appropriate initialization and with a sufficiently small step size. Finally, we validate our theoretical findings through experiments.
Similar Papers
Reversible Deep Equilibrium Models
Machine Learning (CS)
Makes AI learn better with fewer steps.
DDEQs: Distributional Deep Equilibrium Models through Wasserstein Gradient Flows
Machine Learning (CS)
Helps computers understand shapes and groups of dots.
Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective
Machine Learning (CS)
Helps computers learn by simplifying math.