Cross-Dialect Bird Species Recognition with Dialect-Calibrated Augmentation
By: Jiani Ding , Qiyang Sun , Alican Akman and more
Potential Business Impact:
Helps computers tell bird songs apart.
Dialect variation hampers automatic recognition of bird calls collected by passive acoustic monitoring. We address the problem on DB3V, a three-region, ten-species corpus of 8-s clips, and propose a deployable framework built on Time-Delay Neural Networks (TDNNs). Frequency-sensitive normalisation (Instance Frequency Normalisation and a gated Relaxed-IFN) is paired with gradient-reversal adversarial training to learn region-invariant embeddings. A multi-level augmentation scheme combines waveform perturbations, Mixup for rare classes, and CycleGAN transfer that synthesises Region 2 (Interior Plains)-style audio, , with Dialect-Calibrated Augmentation (DCA) softly down-weighting synthetic samples to limit artifacts. The complete system lifts cross-dialect accuracy by up to twenty percentage points over baseline TDNNs while preserving in-region performance. Grad-CAM and LIME analyses show that robust models concentrate on stable harmonic bands, providing ecologically meaningful explanations. The study demonstrates that lightweight, transparent, and dialect-resilient bird-sound recognition is attainable.
Similar Papers
Towards High-Fidelity and Controllable Bioacoustic Generation via Enhanced Diffusion Learning
Sound
Makes clear bird sounds from noisy recordings.
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
CV and Pattern Recognition
Helps computers tell similar birds apart better.
Diffusion-Augmented Contrastive Learning: A Noise-Robust Encoder for Biosignal Representations
Machine Learning (CS)
Makes machines understand body signals better.