Score: 0

Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition

Published: October 29, 2025 | arXiv ID: 2510.26838v1

By: Amine Razig , Youssef Soulaymani , Loubna Benabbou and more

Potential Business Impact:

Helps scientists hear whale songs in noisy oceans.

Business Areas:

Image Recognition Data and Analytics, Software

Automated monitoring of marine mammals in the St. Lawrence Estuary faces extreme challenges: calls span low-frequency moans to ultrasonic clicks, often overlap, and are embedded in variable anthropogenic and environmental noise. We introduce a multi-step, attention-guided framework that first segments spectrograms to generate soft masks of biologically relevant energy and then fuses these masks with the raw inputs for multi-band, denoised classification. Image and mask embeddings are integrated via mid-level fusion, enabling the model to focus on salient spectrogram regions while preserving global context. Using real-world recordings from the Saguenay St. Lawrence Marine Park Research Station in Canada, we demonstrate that segmentation-driven attention and mid-level fusion improve signal discrimination, reduce false positive detections, and produce reliable representations for operational marine mammal monitoring across diverse environmental conditions and signal-to-noise ratios. Beyond in-distribution evaluation, we further assess the generalization of Mask-Guided Classification (MGC) under distributional shifts by testing on spectrograms generated with alternative acoustic transformations. While high-capacity baseline models lose accuracy in this Out-of-distribution (OOD) setting, MGC maintains stable performance, with even simple fusion mechanisms (gated, concat) achieving comparable results across distributions. This robustness highlights the capacity of MGC to learn transferable representations rather than overfitting to a specific transformation, thereby reinforcing its suitability for large-scale, real-world biodiversity monitoring. We show that in all experimental settings, the MGC framework consistently outperforms baseline architectures, yielding substantial gains in accuracy on both in-distribution and OOD data.

Ecologically Valid Benchmarking and Adaptive Attention: Scalable Marine Bioacoustic Monitoring

Sound

Helps scientists hear ocean animals better.

4 Sep 2025 0

87%

Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis

CV and Pattern Recognition

Lets computers understand whale songs without training.

6 Sep 2025 0

87%

A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition

Sound

Helps identify underwater sounds with little data.

17 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇫🇷 France

Page Count

15 pages

Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition

Helps scientists hear whale songs in noisy oceans.

Technical Abstract

Ecologically Valid Benchmarking and Adaptive Attention: Scalable Marine Bioacoustic Monitoring

Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis

A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition