Score: 0

Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis

Published: September 6, 2025 | arXiv ID: 2509.05703v1

By: Ragib Amin Nihal , Benjamin Yen , Takeshi Ashizawa and more

Potential Business Impact:

Lets computers understand whale songs without training.

Business Areas:
Image Recognition Data and Analytics, Software

Marine mammal vocalization analysis depends on interpreting bioacoustic spectrograms. Vision Language Models (VLMs) are not trained on these domain-specific visualizations. We investigate whether VLMs can extract meaningful patterns from spectrograms visually. Our framework integrates VLM interpretation with LLM-based validation to build domain knowledge. This enables adaptation to acoustic data without manual annotation or model retraining.

Country of Origin
🇯🇵 Japan

Page Count
4 pages

Category
Computer Science:
CV and Pattern Recognition