Score: 0

Models of Heavy-Tailed Mechanistic Universality

Published: June 4, 2025 | arXiv ID: 2506.03470v1

By: Liam Hodgkinson, Zhichao Wang, Michael W. Mahoney

Potential Business Impact:

Makes computers learn better by understanding data patterns.

Business Areas:

Multi-level Marketing Sales and Marketing

Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians, Hessians, and weight matrices has led to the introduction of the concept of heavy-tailed mechanistic universality (HT-MU). Multiple lines of empirical evidence suggest a robust correlation between heavy-tailed metrics and model performance, indicating that HT-MU may be a fundamental aspect of deep learning efficacy. Here, we propose a general family of random matrix models -- the high-temperature Marchenko-Pastur (HTMP) ensemble -- to explore attributes that give rise to heavy-tailed behavior in trained neural networks. Under this model, spectral densities with power laws on (upper and lower) tails arise through a combination of three independent factors (complex correlation structures in the data; reduced temperatures during training; and reduced eigenvector entropy), appearing as an implicit bias in the model structure, and they can be controlled with an "eigenvalue repulsion" parameter. Implications of our model on other appearances of heavy tails, including neural scaling laws, optimizer trajectories, and the five-plus-one phases of neural network training, are discussed.

Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions

Machine Learning (Stat)

Helps AI learn better by understanding how it works.

29 Sep 2025 1

86%

Approximating Heavy-Tailed Distributions with a Mixture of Bernstein Phase-Type and Hyperexponential Models

Performance

Makes computer models better at predicting rare events.

30 Oct 2025 1

86%

Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization

Machine Learning (CS)

Makes computer vision models smaller, faster.

2 Mar 2025 2

View PDF Login to Bookmark

Country of Origin

🇦🇺 Australia

Page Count

40 pages

Models of Heavy-Tailed Mechanistic Universality

Makes computers learn better by understanding data patterns.

Technical Abstract

Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions

Approximating Heavy-Tailed Distributions with a Mixture of Bernstein Phase-Type and Hyperexponential Models

Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization