Detecting AI-Generated Paraphrases in Bengali: A Comparative Study of Zero-Shot and Fine-Tuned Transformers
By: Md. Rakibul Islam , Most. Sharmin Sultana Samu , Md. Zahid Hossain and more
Potential Business Impact:
Finds fake writing in Bengali language.
Large language models (LLMs) can produce text that closely resembles human writing. This capability raises concerns about misuse, including disinformation and content manipulation. Detecting AI-generated text is essential to maintain authenticity and prevent malicious applications. Existing research has addressed detection in multiple languages, but the Bengali language remains largely unexplored. Bengali's rich vocabulary and complex structure make distinguishing human-written and AI-generated text particularly challenging. This study investigates five transformer-based models: XLMRoBERTa-Large, mDeBERTaV3-Base, BanglaBERT-Base, IndicBERT-Base and MultilingualBERT-Base. Zero-shot evaluation shows that all models perform near chance levels (around 50% accuracy) and highlight the need for task-specific fine-tuning. Fine-tuning significantly improves performance, with XLM-RoBERTa, mDeBERTa and MultilingualBERT achieving around 91% on both accuracy and F1-score. IndicBERT demonstrates comparatively weaker performance, indicating limited effectiveness in fine-tuning for this task. This work advances AI-generated text detection in Bengali and establishes a foundation for building robust systems to counter AI-generated content.
Similar Papers
AI-Generated Text Detection in Low-Resource Languages: A Case Study on Urdu
Computation and Language
Finds fake writing in Urdu.
AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models
Computation and Language
Finds fake writing made by computers.
Bangla Hate Speech Classification with Fine-tuned Transformer Models
Computation and Language
Helps computers find hate speech in Bengali.