Diagonal Linear Networks and the Lasso Regularization Path
By: Raphaël Berthier
Potential Business Impact:
Training computers learns like a math shortcut.
Diagonal linear networks are neural networks with linear activation and diagonal weight matrices. Their theoretical interest is that their implicit regularization can be rigorously analyzed: from a small initialization, the training of diagonal linear networks converges to the linear predictor with minimal 1-norm among minimizers of the training loss. In this paper, we deepen this analysis showing that the full training trajectory of diagonal linear networks is closely related to the lasso regularization path. In this connection, the training time plays the role of an inverse regularization parameter. Both rigorous results and simulations are provided to illustrate this conclusion. Under a monotonicity assumption on the lasso regularization path, the connection is exact while in the general case, we show an approximate connection.
Similar Papers
Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization
Machine Learning (CS)
Makes computer learning faster and more accurate.
Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization
Machine Learning (Stat)
Deeper AI learns better from less data.
Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory
Machine Learning (Stat)
Explains how computer learning gets smarter faster.