Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory
By: Sota Nishiyama, Masaaki Imaizumi
Potential Business Impact:
Explains how computer learning gets smarter faster.
Diagonal linear networks (DLNs) are a tractable model that captures several nontrivial behaviors in neural network training, such as initialization-dependent solutions and incremental learning. These phenomena are typically studied in isolation, leaving the overall dynamics insufficiently understood. In this work, we present a unified analysis of various phenomena in the gradient flow dynamics of DLNs. Using Dynamical Mean-Field Theory (DMFT), we derive a low-dimensional effective process that captures the asymptotic gradient flow dynamics in high dimensions. Analyzing this effective process yields new insights into DLN dynamics, including loss convergence rates and their trade-off with generalization, and systematically reproduces many of the previously observed phenomena. These findings deepen our understanding of DLNs and demonstrate the effectiveness of the DMFT approach in analyzing high-dimensional learning dynamics of neural networks.
Similar Papers
Network Dynamics-Based Framework for Understanding Deep Neural Networks
Machine Learning (CS)
Explains how computer learning gets smarter.
Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective
Machine Learning (CS)
Helps computers learn by simplifying math.
Diagonal Linear Networks and the Lasso Regularization Path
Machine Learning (CS)
Training computers learns like a math shortcut.