Training NTK to Generalize with KARE
By: Johannes Schwab , Bryan Kelly , Semyon Malamud and more
Potential Business Impact:
Trains computers to learn better than before.
The performance of the data-dependent neural tangent kernel (NTK; Jacot et al. (2018)) associated with a trained deep neural network (DNN) often matches or exceeds that of the full network. This implies that DNN training via gradient descent implicitly performs kernel learning by optimizing the NTK. In this paper, we propose instead to optimize the NTK explicitly. Rather than minimizing empirical risk, we train the NTK to minimize its generalization error using the recently developed Kernel Alignment Risk Estimator (KARE; Jacot et al. (2020)). Our simulations and real data experiments show that NTKs trained with KARE consistently match or significantly outperform the original DNN and the DNN- induced NTK (the after-kernel). These results suggest that explicitly trained kernels can outperform traditional end-to-end DNN optimization in certain settings, challenging the conventional dominance of DNNs. We argue that explicit training of NTK is a form of over-parametrized feature learning.
Similar Papers
Neural Tangent Kernels for Complex Genetic Risk Prediction: Bridging Deep Learning and Kernel Methods in Genomics
Applications
Finds hidden disease risks in your genes.
Nonlocal Neural Tangent Kernels via Parameter-Space Interactions
Machine Learning (CS)
Helps computers learn from messy, imperfect data.
Divergence of Empirical Neural Tangent Kernel in Classification Problems
Machine Learning (CS)
Shows how some computer brains learn differently than expected.