Bridging the Reality Gap: Efficient Adaptation of ASR systems for Challenging Low-Resource Domains
By: Darshil Chauhan , Adityasinh Solanki , Vansh Patel and more
Potential Business Impact:
Makes doctors' notes understandable by computers.
Automatic Speech Recognition (ASR) holds immense potential to streamline clinical documentation, such as digitizing handwritten prescriptions and reports, thereby increasing patient throughput and reducing costs in resource-constrained sectors like rural healthcare. However, realizing this utility is currently obstructed by significant technical barriers: strict data privacy constraints, limited computational resources, and severe acoustic domain shifts. We quantify this gap by showing that a robust multilingual model (IndicWav2Vec) degrades to a stark 40.94% Word Error Rate (WER) when deployed on real-world clinical audio (Gram Vaani), rendering it unusable for practical applications. To address these challenges and bring ASR closer to deployment, we propose an efficient, privacy-preserving adaptation framework. We employ Low-Rank Adaptation (LoRA) to enable continual learning from incoming data streams directly on edge devices, ensuring patient data confidentiality. Our strategy yields a 17.1% relative improvement in WER on the target domain. Furthermore, by integrating multi-domain experience replay, we reduce catastrophic forgetting by 47% compared to naive adaptation. These results demonstrate a viable pathway for building reliable, self-improving ASR systems that can operate effectively within the constraints of high-impact real-world environments.
Similar Papers
ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages
Computation and Language
Helps doctors understand patient voices in India.
Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
Computation and Language
Helps computers understand African languages better.
WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue
Computation and Language
Makes doctor talk machines safer for patients.