Sharp Minima Can Generalize: A Loss Landscape Perspective On Data
By: Raymond Fan, Bryce Sandlund, Lin Myat Ko
Potential Business Impact:
More data helps computers learn better from examples.
The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization. Measuring minima volumes under varying amounts of training data reveals sharp minima which generalize well exist, but are unlikely to be found due to their small volumes. Increasing data changes the loss landscape, such that previously small generalizing minima become (relatively) large.
Similar Papers
A Function Centric Perspective On Flat and Sharp Minima
Machine Learning (CS)
Sharpness can make AI smarter and safer.
Flat Minima and Generalization: Insights from Stochastic Convex Optimization
Machine Learning (CS)
Makes computers learn better, even when they're wrong.
When Flatness Does (Not) Guarantee Adversarial Robustness
Machine Learning (CS)
Makes AI less fooled by tricky mistakes.