Score: 0

BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture

Published: November 5, 2025 | arXiv ID: 2511.03180v1

By: Shahriyar Zaman Ridoy, Azmine Toushik Wasi, Koushik Ahamed Tonmoy

Potential Business Impact:

Helps AI understand right and wrong in Bengali.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

As multilingual Large Language Models (LLMs) gain traction across South Asia, their alignment with local ethical norms, particularly for Bengali, which is spoken by over 285 million people and ranked 6th globally, remains underexplored. Existing ethics benchmarks are largely English-centric and shaped by Western frameworks, overlooking cultural nuances critical for real-world deployment. To address this, we introduce BengaliMoralBench, the first large-scale ethics benchmark for the Bengali language and socio-cultural contexts. It covers five moral domains, Daily Activities, Habits, Parenting, Family Relationships, and Religious Activities, subdivided into 50 culturally relevant subtopics. Each scenario is annotated via native-speaker consensus using three ethical lenses: Virtue, Commonsense, and Justice ethics. We conduct systematic zero-shot evaluation of prominent multilingual LLMs, including Llama, Gemma, Qwen, and DeepSeek, using a unified prompting protocol and standard metrics. Performance varies widely (50-91% accuracy), with qualitative analysis revealing consistent weaknesses in cultural grounding, commonsense reasoning, and moral fairness. BengaliMoralBench provides a foundation for responsible localization, enabling culturally aligned evaluation and supporting the deployment of ethically robust AI in diverse, low-resource multilingual settings such as Bangladesh.

LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models

Computers and Society

Tests if AI makes good and fair choices.

1 May 2025 2

89%

From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge

Computation and Language

Helps computers understand Bengali culture better.

22 Oct 2025 0

89%

BanglaMATH : A Bangla benchmark dataset for testing LLM mathematical reasoning at grades 6, 7, and 8

Computers and Society

Helps computers solve math problems in Bengali.

13 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇧🇩 Bangladesh

Page Count

36 pages

BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture

Helps AI understand right and wrong in Bengali.

Technical Abstract

LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models

From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge

BanglaMATH : A Bangla benchmark dataset for testing LLM mathematical reasoning at grades 6, 7, and 8