Score: 0

Diagonal Linear Networks and the Lasso Regularization Path

Published: September 23, 2025 | arXiv ID: 2509.18766v1

By: Raphaël Berthier

Potential Business Impact:

Training computers learns like a math shortcut.

Business Areas:

Multi-level Marketing Sales and Marketing

Diagonal linear networks are neural networks with linear activation and diagonal weight matrices. Their theoretical interest is that their implicit regularization can be rigorously analyzed: from a small initialization, the training of diagonal linear networks converges to the linear predictor with minimal 1-norm among minimizers of the training loss. In this paper, we deepen this analysis showing that the full training trajectory of diagonal linear networks is closely related to the lasso regularization path. In this connection, the training time plays the role of an inverse regularization parameter. Both rigorous results and simulations are provided to illustrate this conclusion. Under a monotonicity assumption on the lasso regularization path, the connection is exact while in the general case, we show an approximate connection.

Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization

Machine Learning (CS)

Makes computer learning faster and more accurate.

14 Mar 2025 0

88%

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Machine Learning (Stat)

Deeper AI learns better from less data.

1 Jun 2025 0

87%

Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory

Machine Learning (Stat)

Explains how computer learning gets smarter faster.

2 Oct 2025 1

View PDF Login to Bookmark

Page Count

29 pages

Diagonal Linear Networks and the Lasso Regularization Path

Training computers learns like a math shortcut.

Technical Abstract

Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory