Score: 0

Reject or Not?: A Benchmark for Voice Assistant Query Rejection in Smart Home Scenario and an Improved Method Based on LLMs

Published: December 11, 2025 | arXiv ID: 2512.10257v1

By: Huichao Men , Yizhen Hu , Yingyang He and more

Potential Business Impact:

Helps voice assistants understand what you really mean.

Business Areas:

Speech Recognition Data and Analytics, Software

In smart-home voice assistant scenario, deciding whether to accept or reject a user query is the first step before any downstream processing. To address the limited query-rejection capability of current voice assistants, this paper presents the first Chinese-oriented open-source benchmark and evaluation suite for smart homes, together with a personalized query-rejection method based on large language models. On the data side, we construct the first multimodal query-rejection dataset tailored for domestic scenarios, containing 11,913 manually labeled text-speech pairs that systematically cover twelve typical dialogue types (e.g., chit-chat, non-human sounds, valid commands, ambiguous references, device-irrelevant requests). Fine-grained labels, conversational context and multi-turn information are provided to support both zero-shot and fine-tuning evaluations across language and multimodal large models. On the method side, we propose a three-tier collaborative architecture: first, a Qwen-2.5-3B adapter fine-tuned to model family-agnostic semantic boundaries; second, a dynamic household-level historical dialogue module to capture personalized habits; third, a household-specific RAG knowledge base that explicitly memorizes and revises past false-rejection cases. Experiments show that the proposed approach significantly outperforms zero-shot and fine-tuned general LLMs on the constructed dataset, with pronounced gains in rejection accuracy for family-specific expressions and complex multi-turn scenarios. This work provides a reproducible data foundation, evaluation standard and extensible technical framework for reliability research in smart-home voice interaction.

VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin Context

Computation and Language

Tests how well computers understand spoken Chinese.

11 Nov 2025 2

87%

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?

Computation and Language

Tests phone AI for everyday Chinese tasks.

8 Mar 2025 1

87%

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

CV and Pattern Recognition

Helps computers understand who speaks in videos.

1 Dec 2025 1

View PDF Login to Bookmark

Page Count

17 pages

Reject or Not?: A Benchmark for Voice Assistant Query Rejection in Smart Home Scenario and an Improved Method Based on LLMs

Helps voice assistants understand what you *really* mean.

Technical Abstract

VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin Context

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Helps voice assistants understand what you really mean.