Adaptive Knowledge Distillation for Device-Directed Speech Detection
By: Hyung Gun Chi , Florian Pesce , Wonil Chang and more
Potential Business Impact:
Makes voice assistants hear you better.
Device-directed speech detection (DDSD) is a binary classification task that separates the user's queries to a voice assistant (VA) from background speech or side conversations. This is important for achieving naturalistic user experience. To this end, we propose knowledge distillation (KD) to enhance DDSD accuracy while ensuring efficient deployment. Specifically, we introduce a novel adaptive KD method that transfers knowledge from general representations of an ASR large pre-trained acoustic encoder (teacher). We apply task-specific adapters, on top of the (frozen) teacher encoder, trained jointly with the student model on DDSD. We demonstrate that the proposed adaptive KD outperforms the student model without distillation in the keyword and keyword-free (follow-up) invocations, with an improvement of +26% and +19% in terms of Equal Error Rate, respectively. We also show that this approach generalizes across the transformer and conformer-based model architectures.
Similar Papers
Adaptive Knowledge Distillation using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification
Sound
Makes computers hear sounds from different devices.
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
Sound
Makes noisy sounds clear for small devices.
LLM-Oriented Token-Adaptive Knowledge Distillation
Computation and Language
Makes AI learn better by focusing on hard parts.