Resource-sensitive but language-blind: Community size and not grammatical complexity better predicts the accuracy of Large Language Models in a novel Wug Test
By: Nikoleta Pantelidou, Evelina Leivada, Paolo Morosi
Potential Business Impact:
Computers learn new words like people, but for data.
The linguistic abilities of Large Language Models are a matter of ongoing debate. This study contributes to this discussion by investigating model performance in a morphological generalization task that involves novel words. Using a multilingual adaptation of the Wug Test, six models were tested across four partially unrelated languages (Catalan, English, Greek, and Spanish) and compared with human speakers. The aim is to determine whether model accuracy approximates human competence and whether it is shaped primarily by linguistic complexity or by the quantity of available training data. Consistent with previous research, the results show that the models are able to generalize morphological processes to unseen words with human-like accuracy. However, accuracy patterns align more closely with community size and data availability than with structural complexity, refining earlier claims in the literature. In particular, languages with larger speaker communities and stronger digital representation, such as Spanish and English, revealed higher accuracy than less-resourced ones like Catalan and Greek. Overall, our findings suggest that model behavior is mainly driven by the richness of linguistic resources rather than by sensitivity to grammatical complexity, reflecting a form of performance that resembles human linguistic competence only superficially.
Similar Papers
Towards Fundamental Language Models: Does Linguistic Competence Scale with Model Size?
Computation and Language
Makes AI smarter by separating words from facts.
Are the LLMs Capable of Maintaining at Least the Language Genus?
Computation and Language
Computers understand languages better when they're related.
From Phonemes to Meaning: Evaluating Large Language Models on Tamil
Computation and Language
Tests computers on Tamil language understanding.