Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization
By: Ajwad Abrar, Farzana Tabassum, Sabbir Ahmed
Potential Business Impact:
Helps computers understand health questions in Bengali.
Consumer Health Queries (CHQs) in Bengali (Bangla), a low-resource language, often contain extraneous details, complicating efficient medical responses. This study investigates the zero-shot performance of nine advanced large language models (LLMs): GPT-3.5-Turbo, GPT-4, Claude-3.5-Sonnet, Llama3-70b-Instruct, Mixtral-8x22b-Instruct, Gemini-1.5-Pro, Qwen2-72b-Instruct, Gemma-2-27b, and Athene-70B, in summarizing Bangla CHQs. Using the BanglaCHQ-Summ dataset comprising 2,350 annotated query-summary pairs, we benchmarked these LLMs using ROUGE metrics against Bangla T5, a fine-tuned state-of-the-art model. Mixtral-8x22b-Instruct emerged as the top performing model in ROUGE-1 and ROUGE-L, while Bangla T5 excelled in ROUGE-2. The results demonstrate that zero-shot LLMs can rival fine-tuned models, achieving high-quality summaries even without task-specific training. This work underscores the potential of LLMs in addressing challenges in low-resource languages, providing scalable solutions for healthcare query summarization.
Similar Papers
Faithful Summarization of Consumer Health Queries: A Cross-Lingual Framework with LLMs
Computation and Language
Makes doctor notes easier to understand safely.
An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques
Computation and Language
Helps computers summarize long articles better.
Evaluating Large Language Models for Evidence-Based Clinical Question Answering
Computation and Language
Helps doctors answer patient questions better.