A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition
By: Yuanpeng Li
Potential Business Impact:
Teaches computers to understand new word combinations.
Compositional generalization is a crucial property in artificial intelligence, enabling models to handle novel combinations of known components. While most deep learning models lack this capability, certain models succeed in specific tasks, suggesting the existence of governing conditions. This paper derives a necessary and sufficient condition for compositional generalization in neural networks. Conceptually, it requires that (i) the computational graph matches the true compositional structure, and (ii) components encode just enough information in training. The condition is supported by mathematical proofs. This criterion combines aspects of architecture design, regularization, and training data properties. A carefully designed minimal example illustrates an intuitive understanding of the condition. We also discuss the potential of the condition for assessing compositional generalization before training. This work is a fundamental theoretical study of compositional generalization in neural networks.
Similar Papers
Learning by Analogy: A Causal Framework for Composition Generalization
Machine Learning (CS)
Lets computers understand new ideas by breaking them down.
Scalable Evaluation and Neural Models for Compositional Generalization
Machine Learning (CS)
Teaches computers to understand new things from old.
Scale leads to compositional generalization
Machine Learning (CS)
Computers learn to combine ideas to do new tasks.