MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
By: Yuezhang Peng , Chonghao Cai , Ziang Liu and more
Potential Business Impact:
Helps cars understand what drivers say better.
Spoken Language Understanding (SLU), which aims to extract user semantics to execute downstream tasks, is a crucial component of task-oriented dialog systems. Existing SLU datasets generally lack sufficient diversity and complexity, and there is an absence of a unified benchmark for the latest Large Language Models (LLMs) and Large Audio Language Models (LALMs). This work introduces MAC-SLU, a novel Multi-Intent Automotive Cabin Spoken Language Understanding Dataset, which increases the difficulty of the SLU task by incorporating authentic and complex multi-intent data. Based on MAC-SLU, we conducted a comprehensive benchmark of leading open-source LLMs and LALMs, covering methods like in-context learning, supervised fine-tuning (SFT), and end-to-end (E2E) and pipeline paradigms. Our experiments show that while LLMs and LALMs have the potential to complete SLU tasks through in-context learning, their performance still lags significantly behind SFT. Meanwhile, E2E LALMs demonstrate performance comparable to pipeline approaches and effectively avoid error propagation from speech recognition. Code\footnote{https://github.com/Gatsby-web/MAC\_SLU} and datasets\footnote{huggingface.co/datasets/Gatsby1984/MAC\_SLU} are released publicly.
Similar Papers
Multi-Intent Spoken Language Understanding: Methods, Trends, and Challenges
Computation and Language
Helps computers understand when you ask for two things.
MSU-Bench: Towards Understanding the Conversational Multi-talker Scenarios
Audio and Speech Processing
Helps computers understand talking in noisy groups.
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
Computation and Language
Helps computers know who said what in talks.