BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference
By: Farah Binta Haque , Md Yasin , Shishir Saha and more
Potential Business Impact:
Helps computers understand Bengali sentences better.
Despite the growing progress in Natural Language Inference (NLI) research, resources for the Bengali language remain extremely limited. Existing Bengali NLI datasets exhibit several inconsistencies, including annotation errors, ambiguous sentence pairs, and inadequate linguistic diversity, which hinder effective model training and evaluation. To address these limitations, we introduce BNLI, a refined and linguistically curated Bengali NLI dataset designed to support robust language understanding and inference modeling. The dataset was constructed through a rigorous annotation pipeline emphasizing semantic clarity and balance across entailment, contradiction, and neutrality classes. We benchmarked BNLI using a suite of state-of-the-art transformer-based architectures, including multilingual and Bengali-specific models, to assess their ability to capture complex semantic relations in Bengali text. The experimental findings highlight the improved reliability and interpretability achieved with BNLI, establishing it as a strong foundation for advancing research in Bengali and other low-resource language inference tasks.
Similar Papers
Evaluating LLMs' Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis
Computation and Language
Helps computers understand Bengali language better.
Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference
Computation and Language
Teaches computers to understand how sentences relate.
From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge
Computation and Language
Helps computers understand Bengali culture better.