Score: 1

TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics

Published: September 30, 2025 | arXiv ID: 2509.26329v1

By: Yi-Cheng Lin , Yu-Hua Chen , Jia-Kai Dong and more

Potential Business Impact:

Teaches computers to hear local sounds.

Business Areas:

Audio Media and Entertainment, Music and Audio

Large audio-language models are advancing rapidly, yet most evaluations emphasize speech or globally sourced sounds, overlooking culturally distinctive cues. This gap raises a critical question: can current models generalize to localized, non-semantic audio that communities instantly recognize but outsiders do not? To address this, we present TAU (Taiwan Audio Understanding), a benchmark of everyday Taiwanese "soundmarks." TAU is built through a pipeline combining curated sources, human editing, and LLM-assisted question generation, producing 702 clips and 1,794 multiple-choice items that cannot be solved by transcripts alone. Experiments show that state-of-the-art LALMs, including Gemini 2.5 and Qwen2-Audio, perform far below local humans. TAU demonstrates the need for localized benchmarks to reveal cultural blind spots, guide more equitable multimodal evaluation, and ensure models serve communities beyond the global mainstream.

TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models

Sound

Helps computers understand exact moments in audio.

14 Nov 2025 1

89%

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

CV and Pattern Recognition

Helps computers understand who speaks in videos.

1 Dec 2025 1

89%

The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS

Artificial Intelligence

Tests AI's ability to understand music.

21 Oct 2025 1

View PDF Login to Bookmark

Page Count

5 pages

TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics

Teaches computers to hear local sounds.

Technical Abstract

TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS