Score: 0

Stemming -- The Evolution and Current State with a Focus on Bangla

Published: August 21, 2025 | arXiv ID: 2508.15711v1

By: Abhijit Paul , Mashiat Amin Farin , Sharif Md. Abdullah and more

Potential Business Impact:

Helps computers understand Bengali words better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Bangla, the seventh most widely spoken language worldwide with 300 million native speakers, faces digital under-representation due to limited resources and lack of annotated datasets. Stemming, a critical preprocessing step in language analysis, is essential for low-resource, highly-inflectional languages like Bangla, because it can reduce the complexity of algorithms and models by significantly reducing the number of words the algorithm needs to consider. This paper conducts a comprehensive survey of stemming approaches, emphasizing the importance of handling morphological variants effectively. While exploring the landscape of Bangla stemming, it becomes evident that there is a significant gap in the existing literature. The paper highlights the discontinuity from previous research and the scarcity of accessible implementations for replication. Furthermore, it critiques the evaluation methodologies, stressing the need for more relevant metrics. In the context of Bangla's rich morphology and diverse dialects, the paper acknowledges the challenges it poses. To address these challenges, the paper suggests directions for Bangla stemmer development. It concludes by advocating for robust Bangla stemmers and continued research in the field to enhance language analysis and processing.

Advancing Bangla Machine Translation Through Informal Datasets

Computation and Language

Lets millions understand online info in Bangla.

15 Dec 2025 0

88%

A Task-Oriented Evaluation Framework for Text Normalization in Modern NLP Pipelines

Computation and Language

Helps computers understand words without changing their meaning.

25 Nov 2025 0

87%

Integrating Linguistics and AI: Morphological Analysis and Corpus development of Endangered Toto Language of West Bengal

Computation and Language

Saves dying languages with a smart app.

26 Oct 2025 0

View PDF Login to Bookmark

Page Count

10 pages

Stemming -- The Evolution and Current State with a Focus on Bangla

Helps computers understand Bengali words better.

Technical Abstract

Advancing Bangla Machine Translation Through Informal Datasets

A Task-Oriented Evaluation Framework for Text Normalization in Modern NLP Pipelines

Integrating Linguistics and AI: Morphological Analysis and Corpus development of Endangered Toto Language of West Bengal