Adaptive Knowledge Distillation using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification
By: Seung Gyu Jeong, Seong Eun Kim
Potential Business Impact:
Makes computers hear sounds from different devices.
In this technical report, we describe our submission for Task 1, Low-Complexity Device-Robust Acoustic Scene Classification, of the DCASE 2025 Challenge. Our work tackles the dual challenges of strict complexity constraints and robust generalization to both seen and unseen devices, while also leveraging the new rule allowing the use of device labels at test time. Our proposed system is based on a knowledge distillation framework where an efficient CP-MobileNet student learns from a compact, specialized two-teacher ensemble. This ensemble combines a baseline PaSST teacher, trained with standard cross-entropy, and a 'generalization expert' teacher. This expert is trained using our novel Device-Aware Feature Alignment (DAFA) loss, adapted from prior work, which explicitly structures the feature space for device robustness. To capitalize on the availability of test-time device labels, the distilled student model then undergoes a final device-specific fine-tuning stage. Our proposed system achieves a final accuracy of 57.93\% on the development set, demonstrating a significant improvement over the official baseline, particularly on unseen devices.
Similar Papers
Adaptive Knowledge Distillation for Device-Directed Speech Detection
Sound
Makes voice assistants hear you better.
Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge
Audio and Speech Processing
Helps computers know sounds from different devices.
Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification
Sound
Makes small computer programs learn big program skills.