Score: 1

Evaluation of Clinical Trials Reporting Quality using Large Language Models

Published: October 5, 2025 | arXiv ID: 2510.04338v1

By: Mathieu Laï-king, Patrick Paroubek

Potential Business Impact:

Helps doctors check if medical studies are honest.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Reporting quality is an important topic in clinical trial research articles, as it can impact clinical decisions. In this article, we test the ability of large language models to assess the reporting quality of this type of article using the Consolidated Standards of Reporting Trials (CONSORT). We create CONSORT-QA, an evaluation corpus from two studies on abstract reporting quality with CONSORT-abstract standards. We then evaluate the ability of different large generative language models (from the general domain or adapted to the biomedical domain) to correctly assess CONSORT criteria with different known prompting methods, including Chain-of-thought. Our best combination of model and prompting method achieves 85% accuracy. Using Chain-of-thought adds valuable information on the model's reasoning for completing the task.

Repos / Data Links

Page Count
16 pages

Category
Computer Science:
Computation and Language