Towards Quantifying the Hessian Structure of Neural Networks
By: Zhaorui Dong , Yushun Zhang , Jianfeng Yao and more
Potential Business Impact:
Explains why computer brains learn better with many choices.
Empirical studies reported that the Hessian matrix of neural networks (NNs) exhibits a near-block-diagonal structure, yet its theoretical foundation remains unclear. In this work, we reveal that the reported Hessian structure comes from a mixture of two forces: a ``static force'' rooted in the architecture design, and a ''dynamic force'' arisen from training. We then provide a rigorous theoretical analysis of ''static force'' at random initialization. We study linear models and 1-hidden-layer networks for classification tasks with $C$ classes. By leveraging random matrix theory, we compare the limit distributions of the diagonal and off-diagonal Hessian blocks and find that the block-diagonal structure arises as $C$ becomes large. Our findings reveal that $C$ is one primary driver of the near-block-diagonal structure. These results may shed new light on the Hessian structure of large language models (LLMs), which typically operate with a large $C$ exceeding $10^4$.
Similar Papers
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point
Machine Learning (Stat)
Finds best way for AI to learn faster.
Dynamics of Structured Complex-Valued Hopfield Neural Networks
Neural and Evolutionary Computing
Makes computer memories remember more things.
On the Stability of the Jacobian Matrix in Deep Neural Networks
Machine Learning (CS)
Makes smart computer programs learn better.