Towards Leveraging Sequential Structure in Animal Vocalizations
By: Eklavya Sarkar, Mathew Magimai. -Doss
Potential Business Impact:
Helps understand animal talk by listening to sound order.
Animal vocalizations contain sequential structures that carry important communicative information, yet most computational bioacoustics studies average the extracted frame-level features across the temporal axis, discarding the order of the sub-units within a vocalization. This paper investigates whether discrete acoustic token sequences, derived through vector quantization and gumbel-softmax vector quantization of extracted self-supervised speech model representations can effectively capture and leverage temporal information. To that end, pairwise distance analysis of token sequences generated from HuBERT embeddings shows that they can discriminate call-types and callers across four bioacoustics datasets. Sequence classification experiments using $k$-Nearest Neighbour with Levenshtein distance show that the vector-quantized token sequences yield reasonable call-type and caller classification performances, and hold promise as alternative feature representations towards leveraging sequential information in animal vocalizations.
Similar Papers
Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds
Machine Learning (CS)
Lets computers understand animal sounds like speech.
Lightweight Hopfield Neural Networks for Bioacoustic Detection and Call Monitoring of Captive Primates
Sound
Listens to animal sounds to track their health.
Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization
Computation and Language
Helps computers understand new sign language words.