Dialect Identification Using Resource-Efficient Fine-Tuning Approaches
By: Zirui Lin , Haris Gulzar , Monnika Roslianna Busto and more
Potential Business Impact:
Makes computers understand different accents faster.
Dialect Identification (DI) is a task to recognize different dialects within the same language from a speech signal. DI can help to improve the downstream speech related tasks even when speakers have a strong dialect. However, fine-tuning a speech model for tasks like DI is expensive in terms of computation cost and memory requirement. Recent studies have explored fine-tuning pre-trained speech models for tasks like DI using Parameter-Efficient Fine-Tuning (PEFT) methods, which offer parameter efficiency but limited improvement in memory efficiency and training speed. To address these challenges, we explore Memory-Efficient Fine-Tuning (MEFT) methods, originally proposed for language processing, and apply them to the general-purpose pre-trained speech model. We then comprehensively analyze the GPU memory usage and fine-tuning speed based on various MEFT methods. As a case study, we fine-tune the Whisper model to identify six Mandarin subdialects from the KeSpeech dataset, reducing GPU memory usage by up to 73.25% and accelerating training speed by a factor of 2.1, while maintaining accuracy comparable to vanilla fine-tuning and PEFT methods.
Similar Papers
Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications
Computation and Language
Helps computers understand different Arabic languages.
PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models
Computation and Language
Makes big AI models learn new things cheaply.
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Computation and Language
Tests how to make AI smaller and faster.