Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications
By: Vani Kanjirangat, Ljiljana Dolamic, Fabio Rinaldi
Potential Business Impact:
Helps computers understand different Arabic languages.
This paper discusses our exploration of different data-efficient and parameter-efficient approaches to Arabic Dialect Identification (ADI). In particular, we investigate various soft-prompting strategies, including prefix-tuning, prompt-tuning, P-tuning, and P-tuning V2, as well as LoRA reparameterizations. For the data-efficient strategy, we analyze hard prompting with zero-shot and few-shot inferences to analyze the dialect identification capabilities of Large Language Models (LLMs). For the parameter-efficient PEFT approaches, we conducted our experiments using Arabic-specific encoder models on several major datasets. We also analyzed the n-shot inferences on open-source decoder-only models, a general multilingual model (Phi-3.5), and an Arabic-specific one(SILMA). We observed that the LLMs generally struggle to differentiate the dialectal nuances in the few-shot or zero-shot setups. The soft-prompted encoder variants perform better, while the LoRA-based fine-tuned models perform best, even surpassing full fine-tuning.
Similar Papers
Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications
Computation and Language
Helps computers understand different Arabic languages.
Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation
Computation and Language
Translates spoken Arabic into written Arabic.
A Survey on Prompt Tuning
Computation and Language
Teaches computers new tricks without changing their brains.