The aftermath of compounds: Investigating Compounds and their Semantic Representations
By: Swarang Joshi
Potential Business Impact:
Helps computers understand word meanings better.
This study investigates how well computational embeddings align with human semantic judgments in the processing of English compound words. We compare static word vectors (GloVe) and contextualized embeddings (BERT) against human ratings of lexeme meaning dominance (LMD) and semantic transparency (ST) drawn from a psycholinguistic dataset. Using measures of association strength (Edinburgh Associative Thesaurus), frequency (BNC), and predictability (LaDEC), we compute embedding-derived LMD and ST metrics and assess their relationships with human judgments via Spearmans correlation and regression analyses. Our results show that BERT embeddings better capture compositional semantics than GloVe, and that predictability ratings are strong predictors of semantic transparency in both human and model data. These findings advance computational psycholinguistics by clarifying the factors that drive compound word processing and offering insights into embedding-based semantic modeling.
Similar Papers
Semantic Structure in Large Language Model Embeddings
Computation and Language
Words have simple meanings inside computers.
How do Transformer Embeddings Represent Compositions? A Functional Analysis
Computation and Language
Computers understand word parts better.
Word Meanings in Transformer Language Models
Computation and Language
Computers understand word meanings like people do.