Evaluating Polish linguistic and cultural competency in large language models
By: Sławomir Dadas , Małgorzata Grębowiec , Michał Perełkiewicz and more
Potential Business Impact:
Tests if computers understand Polish culture.
Large language models (LLMs) are becoming increasingly proficient in processing and generating multilingual texts, which allows them to address real-world problems more effectively. However, language understanding is a far more complex issue that goes beyond simple text analysis. It requires familiarity with cultural context, including references to everyday life, historical events, traditions, folklore, literature, and pop culture. A lack of such knowledge can lead to misinterpretations and subtle, hard-to-detect errors. To examine language models' knowledge of the Polish cultural context, we introduce the Polish linguistic and cultural competency benchmark, consisting of 600 manually crafted questions. The benchmark is divided into six categories: history, geography, culture & tradition, art & entertainment, grammar, and vocabulary. As part of our study, we conduct an extensive evaluation involving over 30 open-weight and commercial LLMs. Our experiments provide a new perspective on Polish competencies in language models, moving past traditional natural language processing tasks and general knowledge assessment.
Similar Papers
LLMzSzŁ: a comprehensive LLM benchmark for Polish
Computation and Language
Tests computers on Polish school exams.
From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge
Computation and Language
Helps computers understand Bengali culture better.
SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia
Computation and Language
Helps computers understand Saudi culture better.