Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis
By: Ragib Amin Nihal , Benjamin Yen , Takeshi Ashizawa and more
Potential Business Impact:
Lets computers understand whale songs without training.
Marine mammal vocalization analysis depends on interpreting bioacoustic spectrograms. Vision Language Models (VLMs) are not trained on these domain-specific visualizations. We investigate whether VLMs can extract meaningful patterns from spectrograms visually. Our framework integrates VLM interpretation with LLM-based validation to build domain knowledge. This enables adaptation to acoustic data without manual annotation or model retraining.
Similar Papers
Seeing isn't Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms
Computation and Language
Computers can't yet "hear" sounds from pictures.
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
CV and Pattern Recognition
Helps computers understand what's underwater.
EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence
CV and Pattern Recognition
Helps doctors find cancer faster with sound pictures.