Are the LLMs Capable of Maintaining at Least the Language Genus?
By: Sandra Mitrović , David Kletz , Ljiljana Dolamic and more
Potential Business Impact:
Computers understand languages better when they're related.
Large Language Models (LLMs) display notable variation in multilingual behavior, yet the role of genealogical language structure in shaping this variation remains underexplored. In this paper, we investigate whether LLMs exhibit sensitivity to linguistic genera by extending prior analyses on the MultiQ dataset. We first check if models prefer to switch to genealogically related languages when prompt language fidelity is not maintained. Next, we investigate whether knowledge consistency is better preserved within than across genera. We show that genus-level effects are present but strongly conditioned by training resource availability. We further observe distinct multilingual strategies across LLMs families. Our findings suggest that LLMs encode aspects of genus-level structure, but training data imbalances remain the primary factor shaping their multilingual performance.
Similar Papers
The Model's Language Matters: A Comparative Privacy Analysis of LLMs
Computation and Language
Languages change how well AI keeps secrets.
Do LLMs exhibit the same commonsense capabilities across languages?
Computation and Language
Computers understand and write stories in many languages.
LLMs Know More Than Words: A Genre Study with Syntax, Metaphor & Phonetics
Computation and Language
Helps computers understand poetry and stories better.