Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
By: Francesco D'Amico, Dario Bocchi, Matteo Negri
Potential Business Impact:
Helps AI learn better by watching how it trains.
Scaling laws in deep learning - empirical power-law relationships linking model performance to resource growth - have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training or on the optimal training time given the model size. In this work, we uncover a richer picture by analyzing the entire training dynamics through the lens of spectral complexity norms. We identify two novel dynamical scaling laws that govern how performance evolves during training. These laws together recover the well-known test error scaling at convergence, offering a mechanistic explanation of generalization emergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a solvable model: a single-layer perceptron trained with binary cross-entropy. In this setting, we show that the growth of spectral complexity driven by the implicit bias mirrors the generalization behavior observed at fixed norm, allowing us to connect the performance dynamics to classical learning rules in the perceptron.
Similar Papers
Neural Scaling Laws for Deep Regression
Machine Learning (CS)
Improves computer predictions with more data.
The Operator Origins of Neural Scaling Laws: A Generalized Spectral Transport Dynamics of Deep Learning
Machine Learning (CS)
Makes AI learn faster and better.
Scaling Laws are Redundancy Laws
Machine Learning (CS)
Explains why bigger computer brains learn faster.