Evaluation of Clinical Trials Reporting Quality using Large Language Models
By: Mathieu Laï-king, Patrick Paroubek
Potential Business Impact:
Helps doctors check if medical studies are honest.
Reporting quality is an important topic in clinical trial research articles, as it can impact clinical decisions. In this article, we test the ability of large language models to assess the reporting quality of this type of article using the Consolidated Standards of Reporting Trials (CONSORT). We create CONSORT-QA, an evaluation corpus from two studies on abstract reporting quality with CONSORT-abstract standards. We then evaluate the ability of different large generative language models (from the general domain or adapted to the biomedical domain) to correctly assess CONSORT criteria with different known prompting methods, including Chain-of-thought. Our best combination of model and prompting method achieves 85% accuracy. Using Chain-of-thought adds valuable information on the model's reasoning for completing the task.
Similar Papers
Evaluating the Ability of Large Language Models to Identify Adherence to CONSORT Reporting Guidelines in Randomized Controlled Trials: A Methodological Evaluation Study
Computation and Language
Helps check if medical study reports are complete.
A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist
Artificial Intelligence
Helps doctors check if medical studies follow rules.
Evaluating Large Language Models for Evidence-Based Clinical Question Answering
Computation and Language
Helps doctors answer patient questions better.