Score: 1

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

Published: August 7, 2025 | arXiv ID: 2508.05782v1

By: Xiangyan Chen , Yufeng Li , Yujian Gan and more

Potential Business Impact:

Helps computers tell if their answers are true.

Large Language Models (LLMs) are known to produce hallucinations - factually incorrect or fabricated information - which poses significant challenges for many Natural Language Processing (NLP) applications, such as dialogue systems. As a result, detecting hallucinations has become a critical area of research. Current approaches to hallucination detection in dialogue systems primarily focus on verifying the factual consistency of generated responses. However, these responses often contain a mix of accurate, inaccurate or unverifiable facts, making one factual label overly simplistic and coarse-grained. In this paper, we introduce a benchmark, FineDialFact, for fine-grained dialogue fact verification, which involves verifying atomic facts extracted from dialogue responses. To support this, we construct a dataset based on publicly available dialogue datasets and evaluate it using various baseline methods. Experimental results demonstrate that methods incorporating Chain-of-Thought (CoT) reasoning can enhance performance in dialogue fact verification. Despite this, the best F1-score achieved on the HybriDialogue, an open-domain dialogue dataset, is only 0.75, indicating that the benchmark remains a challenging task for future research. Our dataset and code will be public on GitHub.

FECT: Factuality Evaluation of Interpretive AI-Generated Claims in Contact Center Conversation Transcripts

Computation and Language

Verifies AI truth in customer call summaries

26 Jul 2025 1

91%

FActBench: A Benchmark for Fine-grained Automatic Evaluation of LLM-Generated Text in the Medical Domain

Computation and Language

Checks if AI gives correct medical advice.

2 Sep 2025 2

91%

Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models

Artificial Intelligence

Fixes AI lies to make it more truthful.

26 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Repos / Data Links

github.com

Page Count

13 pages

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

Helps computers tell if their answers are true.

Technical Abstract

FECT: Factuality Evaluation of Interpretive AI-Generated Claims in Contact Center Conversation Transcripts

FActBench: A Benchmark for Fine-grained Automatic Evaluation of LLM-Generated Text in the Medical Domain

Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models