Score: 0

Towards Leveraging Sequential Structure in Animal Vocalizations

Published: November 13, 2025 | arXiv ID: 2511.10190v1

By: Eklavya Sarkar, Mathew Magimai. -Doss

Potential Business Impact:

Helps understand animal talk by listening to sound order.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Animal vocalizations contain sequential structures that carry important communicative information, yet most computational bioacoustics studies average the extracted frame-level features across the temporal axis, discarding the order of the sub-units within a vocalization. This paper investigates whether discrete acoustic token sequences, derived through vector quantization and gumbel-softmax vector quantization of extracted self-supervised speech model representations can effectively capture and leverage temporal information. To that end, pairwise distance analysis of token sequences generated from HuBERT embeddings shows that they can discriminate call-types and callers across four bioacoustics datasets. Sequence classification experiments using $k$-Nearest Neighbour with Levenshtein distance show that the vector-quantized token sequences yield reasonable call-type and caller classification performances, and hold promise as alternative feature representations towards leveraging sequential information in animal vocalizations.

Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds

Machine Learning (CS)

Lets computers understand animal sounds like speech.

4 Sep 2025 1

86%

Lightweight Hopfield Neural Networks for Bioacoustic Detection and Call Monitoring of Captive Primates

Sound

Listens to animal sounds to track their health.

4 Nov 2025 0

86%

Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization

Computation and Language

Helps computers understand new sign language words.

5 Sep 2025 0

View PDF Login to Bookmark

Page Count

18 pages

Towards Leveraging Sequential Structure in Animal Vocalizations

Helps understand animal talk by listening to sound order.

Technical Abstract

Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds

Lightweight Hopfield Neural Networks for Bioacoustic Detection and Call Monitoring of Captive Primates

Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization