Afri-MCQA: Multimodal Cultural Question Answering for African Languages
By: Atnafu Lambebo Tonja , Srija Anand , Emilio Villa-Cueva and more
Potential Business Impact:
Helps computers understand African languages and cultures.
Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark covering 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel English-African language Q&A pairs across text and speech modalities and was entirely created by native speakers. Benchmarking large language models (LLMs) on Afri-MCQA shows that open-weight models perform poorly across evaluated cultures, with near-zero accuracy on open-ended VQA when queried in native language or speech. To evaluate linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. To support more inclusive multimodal AI development in African languages, we release our Afri-MCQA under academic license or CC BY-NC 4.0 on HuggingFace (https://huggingface.co/datasets/Atnafu/Afri-MCQA)
Similar Papers
AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR
Computation and Language
Tests voice tools for over 100 African accents.
From National Curricula to Cultural Awareness: Constructing Open-Ended Culture-Specific Question Answering Dataset
Computation and Language
Teaches computers Korean culture for better answers.
Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants
Computation and Language
Helps computers understand different Arabic languages better.