Artificially Fluent: Swahili AI Performance Benchmarks Between English-Trained and Natively-Trained Datasets
By: Sophie Jaffer, Simeon Sayer
Potential Business Impact:
AI understands Swahili better when trained in Swahili.
As large language models (LLMs) expand multilingual capabilities, questions remain about the equity of their performance across languages. While many communities stand to benefit from AI systems, the dominance of English in training data risks disadvantaging non-English speakers. To test the hypothesis that such data disparities may affect model performance, this study compares two monolingual BERT models: one trained and tested entirely on Swahili data, and another on comparable English news data. To simulate how multilingual LLMs process non-English queries through internal translation and abstraction, we translated the Swahili news data into English and evaluated it using the English-trained model. This approach tests the hypothesis by evaluating whether translating Swahili inputs for evaluation on an English model yields better or worse performance compared to training and testing a model entirely in Swahili, thus isolating the effect of language consistency versus cross-lingual abstraction. The results prove that, despite high-quality translation, the native Swahili-trained model performed better than the Swahili-to-English translated model, producing nearly four times fewer errors: 0.36% vs. 1.47% respectively. This gap suggests that translation alone does not bridge representational differences between languages and that models trained in one language may struggle to accurately interpret translated inputs due to imperfect internal knowledge representation, suggesting that native-language training remains important for reliable outcomes. In educational and informational contexts, even small performance gaps may compound inequality. Future research should focus on addressing broader dataset development for underrepresented languages and renewed attention to multilingual model evaluation, ensuring the reinforcing effect of global AI deployment on existing digital divides is reduced.
Similar Papers
Language Diversity: Evaluating Language Usage and AI Performance on African Languages in Digital Spaces
Computation and Language
Helps computers understand African languages better.
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Computation and Language
Tests AI fairly in many languages.
Languages Still Left Behind: Toward a Better Multilingual Machine Translation Benchmark
Computation and Language
Makes language translators more accurate for everyone.