Score: 0

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives

Published: December 30, 2025 | arXiv ID: 2512.24052v1

By: Yanxi Chen , Wenhui Zhu , Xiwen Chen and more

Although Large Audio-Language Models (LALMs) deliver state-of-the-art (SOTA) performance, they frequently suffer from hallucinations, e.g. generating text not grounded in the audio input. We analyze these grounding failures and identify a distinct taxonomy: Event Omission, False Event Identity, Temporal Relation Error, and Quantitative Temporal Error. To address this, we introduce the AHA (Audio Hallucination Alignment) framework. By leveraging counterfactual hard negative mining, our pipeline constructs a high-quality preference dataset that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications. Additionally, we establish AHA-Eval, a diagnostic benchmark designed to rigorously test these fine-grained temporal reasoning capabilities. We apply this data to align Qwen2.5-Omni. The resulting model, Qwen-Audio-AHA, achieves a 13.7% improvement on AHA-Eval. Crucially, this benefit generalizes beyond our diagnostic set. Our model shows substantial gains on public benchmarks, including 1.3% on MMAU-Test and 1.6% on MMAR, outperforming latest SOTA methods.

Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering

Computation and Language

Makes AI answer legal questions truthfully and accurately.

11 Jan 2025 0

89%

Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

Computation and Language

Finds fake answers from smart computer programs.

29 May 2025 1

89%

Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models

Artificial Intelligence

Fixes AI lies to make it more truthful.

26 Oct 2025 1

View PDF Login to Bookmark

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives

Technical Abstract

Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering

Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models