State Space Models for Bioacoustics: A comparative Evaluation with Transformers
By: Chengyu Tang, Sanjeev Baskiyar
Potential Business Impact:
Helps computers identify animal sounds using less power.
In this study, we evaluate the efficacy of the Mamba model in the field of bioacoustics. We first pretrain a Mamba-based audio large language model (LLM) on a large corpus of audio data using self-supervised learning. We fine-tune and evaluate BioMamba on the BEANS benchmark, a collection of diverse bioacoustic tasks including classification and detection, and compare its performance and efficiency with multiple baseline models, including AVES, a state-of-the-art Transformer-based model. The results show that BioMamba achieves comparable performance with AVES while consumption significantly less VRAM, demonstrating its potential in this domain.
Similar Papers
Mamba-2 audio captioning: design space exploration and analysis
Sound
Listens to sounds and describes them in words.
An Exploration of Mamba for Speech Self-Supervised Models
Computation and Language
Makes computers understand speech faster and better.
Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement
Sound
Helps computers hear one voice in noisy crowds.