Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation
By: Kuang Yuan , Yang Gao , Xilin Li and more
Potential Business Impact:
Helps sound machines learn new sounds without retraining.
Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.
Similar Papers
Improving Acoustic Scene Classification with City Features
Sound
Helps computers hear city sounds better.
Adaptive Knowledge Distillation using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification
Sound
Makes computers hear sounds from different devices.
An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift
Sound
Teaches computers to hear sounds anywhere.