MyCulture: Exploring Malaysia's Diverse Culture under Low-Resource Language Constraints
By: Zhong Ken Hew , Jia Xin Low , Sze Jue Yang and more
Potential Business Impact:
Tests if AI understands Malaysian culture fairly.
Large Language Models (LLMs) often exhibit cultural biases due to training data dominated by high-resource languages like English and Chinese. This poses challenges for accurately representing and evaluating diverse cultural contexts, particularly in low-resource language settings. To address this, we introduce MyCulture, a benchmark designed to comprehensively evaluate LLMs on Malaysian culture across six pillars: arts, attire, customs, entertainment, food, and religion presented in Bahasa Melayu. Unlike conventional benchmarks, MyCulture employs a novel open-ended multiple-choice question format without predefined options, thereby reducing guessing and mitigating format bias. We provide a theoretical justification for the effectiveness of this open-ended structure in improving both fairness and discriminative power. Furthermore, we analyze structural bias by comparing model performance on structured versus free-form outputs, and assess language bias through multilingual prompt variations. Our evaluation across a range of regional and international LLMs reveals significant disparities in cultural comprehension, highlighting the urgent need for culturally grounded and linguistically inclusive benchmarks in the development and assessment of LLMs.
Similar Papers
MyCulture: Exploring Malaysia's Diverse Culture under Low-Resource Language Constraints
Computation and Language
Tests computers on Malaysian culture and language.
From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge
Computation and Language
Helps computers understand Bengali culture better.
MELAC: Massive Evaluation of Large Language Models with Alignment of Culture in Persian Language
Computation and Language
Helps computers understand Persian language and culture better.