Navigating High Dimensional Concept Space with Metalearning
By: Max Gupta
Potential Business Impact:
Teaches computers to learn new ideas fast.
Rapidly learning abstract concepts from limited examples is a hallmark of human intelligence. This work investigates whether gradient-based meta-learning can equip neural networks with inductive biases for efficient few-shot acquisition of discrete concepts. I compare meta-learning methods against a supervised learning baseline on Boolean concepts (logical statements) generated by a probabilistic context-free grammar (PCFG). By systematically varying concept dimensionality (number of features) and recursive compositionality (depth of grammar recursion), I delineate between complexity regimes in which meta-learning robustly improves few-shot concept learning and regimes in which it does not. Meta-learners are much better able to handle compositional complexity than featural complexity. I highlight some reasons for this with a representational analysis of the weights of meta-learners and a loss landscape analysis demonstrating how featural complexity increases the roughness of loss trajectories, allowing curvature-aware optimization to be more effective than first-order methods. I find improvements in out-of-distribution generalization on complex concepts by increasing the number of adaptation steps in meta-SGD, where adaptation acts as a way of encouraging exploration of rougher loss basins. Overall, this work highlights the intricacies of learning compositional versus featural complexity in high dimensional concept spaces and provides a road to understanding the role of 2nd order methods and extended gradient adaptation in few-shot concept learning.
Similar Papers
Compressive Meta-Learning
Machine Learning (CS)
Learns from data without seeing all of it.
Dynamic Design of Machine Learning Pipelines via Metalearning
Machine Learning (CS)
Makes smart computer programs learn faster and better.
Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
Machine Learning (CS)
Predicts how computer "brains" learn patterns faster.