Score: 1

LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

Published: November 24, 2025 | arXiv ID: 2511.21761v1

By: Tabia Tanzin Prama, Christopher M. Danforth, Peter Sheridan Dodds

Potential Business Impact:

Helps computers translate Sylheti dialect better.

Business Areas:

Language Learning Education

Large Language Models (LLMs) have demonstrated strong translation abilities through prompting, even without task-specific training. However, their effectiveness in dialectal and low-resource contexts remains underexplored. This study presents the first systematic investigation of LLM-based machine translation (MT) for Sylheti, a dialect of Bangla that is itself low-resource. We evaluate five advanced LLMs (GPT-4.1, GPT-4.1, LLaMA 4, Grok 3, and DeepSeek V3.2) across both translation directions (Bangla $\Leftrightarrow$ Sylheti), and find that these models struggle with dialect-specific vocabulary. To address this, we introduce Sylheti-CAP (Context-Aware Prompting), a three-step framework that embeds a linguistic rulebook, a dictionary (2{,}260 core vocabulary items and idioms), and an authenticity check directly into prompts. Extensive experiments show that Sylheti-CAP consistently improves translation quality across models and prompting strategies. Both automatic metrics and human evaluations confirm its effectiveness, while qualitative analysis reveals notable reductions in hallucinations, ambiguities, and awkward phrasing, establishing Sylheti-CAP as a scalable solution for dialectal and low-resource MT. Dataset link: \href{https://github.com/TabiaTanzin/LLMs-for-Low-Resource-Dialect-Translation-Using-Context-Aware-Prompting-A-Case-Study-on-Sylheti.git}{https://github.com/TabiaTanzin/LLMs-for-Low-Resource-Dialect-Translation-Using-Context-Aware-Prompting-A-Case-Study-on-Sylheti.git}

LLM-Based Evaluation of Low-Resource Machine Translation: A Reference-less Dialect Guided Approach with a Refined Sylheti-English Benchmark

Computation and Language

Helps computers translate languages with many dialects.

18 May 2025 1

90%

Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

Computation and Language

Translates rare languages better than big AI.

20 Oct 2025 0

89%

Beyond the Sentence: A Survey on Context-Aware Machine Translation with Large Language Models

Computation and Language

Makes computer translations understand more context.

9 Jun 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

17 pages

LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

Helps computers translate Sylheti dialect better.

Technical Abstract

LLM-Based Evaluation of Low-Resource Machine Translation: A Reference-less Dialect Guided Approach with a Refined Sylheti-English Benchmark

Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

Beyond the Sentence: A Survey on Context-Aware Machine Translation with Large Language Models