Score: 0

Enhancing Health Fact-Checking with LLM-Generated Synthetic Data

Published: August 28, 2025 | arXiv ID: 2508.20525v1

By: Jingze Zhang , Jiahe Qian , Yiliang Zhou and more

Potential Business Impact:

Makes online health advice more trustworthy.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Fact-checking for health-related content is challenging due to the limited availability of annotated training data. In this study, we propose a synthetic data generation pipeline that leverages large language models (LLMs) to augment training data for health-related fact checking. In this pipeline, we summarize source documents, decompose the summaries into atomic facts, and use an LLM to construct sentence-fact entailment tables. From the entailment relations in the table, we further generate synthetic text-claim pairs with binary veracity labels. These synthetic data are then combined with the original data to fine-tune a BERT-based fact-checking model. Evaluation on two public datasets, PubHealth and SciFact, shows that our pipeline improved F1 scores by up to 0.019 and 0.049, respectively, compared to models trained only on the original data. These results highlight the effectiveness of LLM-driven synthetic data augmentation in enhancing the performance of health-related fact-checkers.

A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs

Computation and Language

Makes fake health records work at any hospital.

20 Apr 2025 0

90%

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

Computation and Language

Checks facts better by searching and thinking.

5 Nov 2025 1

89%

MedFact: Benchmarking the Fact-Checking Capabilities of Large Language Models on Chinese Medical Texts

Computation and Language

Helps doctors trust AI's medical advice.

15 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

8 pages

Enhancing Health Fact-Checking with LLM-Generated Synthetic Data

Makes online health advice more trustworthy.

Technical Abstract

A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

MedFact: Benchmarking the Fact-Checking Capabilities of Large Language Models on Chinese Medical Texts