Sobolev acceleration for neural networks
By: Jong Kwon Oh, Hanbaek Lyu, Hwijae Son
Potential Business Impact:
Makes computer learning faster and better.
Sobolev training, which integrates target derivatives into the loss functions, has been shown to accelerate convergence and improve generalization compared to conventional $L^2$ training. However, the underlying mechanisms of this training method remain only partially understood. In this work, we present the first rigorous theoretical framework proving that Sobolev training accelerates the convergence of Rectified Linear Unit (ReLU) networks. Under a student-teacher framework with Gaussian inputs and shallow architectures, we derive exact formulas for population gradients and Hessians, and quantify the improvements in conditioning of the loss landscape and gradient-flow convergence rates. Extensive numerical experiments validate our theoretical findings and show that the benefits of Sobolev training extend to modern deep learning tasks.
Similar Papers
Sobolev Training of End-to-End Optimization Proxies
Machine Learning (CS)
Makes computer guesses for hard problems faster.
Integral Representations of Sobolev Spaces via ReLU$^k$ Activation Function and Optimal Error Estimates for Linearized Networks
Numerical Analysis
Makes computers learn math faster and better.
Nonlocal techniques for the analysis of deep ReLU neural network approximations
Machine Learning (CS)
Makes AI learn better from fewer examples.