Transfer Learning in Infinite Width Feature Learning Networks
By: Clarissa Lauditi, Blake Bordelon, Cengiz Pehlevan
Potential Business Impact:
Teaches computers to learn new things faster.
We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime. We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly initialized networks trained with weight decay. Both settings track how representations evolve in both source and target tasks. The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks. Reuse of features during transfer learning is controlled by an elastic weight coupling which controls the reliance of the network on features learned during training on the source task. We apply our theory to linear and polynomial regression tasks as well as real datasets. Our theory and experiments reveal interesting interplays between elastic weight coupling, feature learning strength, dataset size, and source and target task alignment on the utility of transfer learning.
Similar Papers
Transfer learning under latent space model
Methodology
Improves computer understanding of online connections.
Wasserstein Transfer Learning
Machine Learning (CS)
Teaches computers to learn from different kinds of data.
Adaptive kernel predictors from feature-learning infinite limits of neural networks
Machine Learning (CS)
Makes smart computer programs learn better.