How do Transformer Embeddings Represent Compositions? A Functional Analysis
By: Aishik Nagar , Ishaan Singh Rawal , Mansi Dhanania and more
Potential Business Impact:
Computers understand word parts better.
Compositionality is a key aspect of human intelligence, essential for reasoning and generalization. While transformer-based models have become the de facto standard for many language modeling tasks, little is known about how they represent compound words, and whether these representations are compositional. In this study, we test compositionality in Mistral, OpenAI Large, and Google embedding models, and compare them with BERT. First, we evaluate compositionality in the representations by examining six diverse models of compositionality (addition, multiplication, dilation, regression, etc.). We find that ridge regression, albeit linear, best accounts for compositionality. Surprisingly, we find that the classic vector addition model performs almost as well as any other model. Next, we verify that most embedding models are highly compositional, while BERT shows much poorer compositionality. We verify and visualize our findings with a synthetic dataset consisting of fully transparent adjective-noun compositions. Overall, we present a thorough investigation of compositionality.
Similar Papers
Quantifying Compositionality of Classic and State-of-the-Art Embeddings
Computation and Language
Helps computers understand new word meanings.
The aftermath of compounds: Investigating Compounds and their Semantic Representations
Computation and Language
Helps computers understand word meanings better.
Layer Specialization Underlying Compositional Reasoning in Transformers
Machine Learning (CS)
Computers learn to build new ideas from old ones.