Score: 0

Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics

Published: December 18, 2025 | arXiv ID: 2512.16530v1

By: Primoz Kocbek, Leon Kopitar, Gregor Stiglic

Potential Business Impact:

Makes doctor's notes easy for anyone to read.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This study investigated the application of Large Language Models (LLMs) for simplifying biomedical texts to enhance health literacy. Using a public dataset, which included plain language adaptations of biomedical abstracts, we developed and evaluated several approaches, specifically a baseline approach using a prompt template, a two AI agent approach, and a fine-tuning approach. We selected OpenAI gpt-4o and gpt-4o mini models as baselines for further research. We evaluated our approaches with quantitative metrics, such as Flesch-Kincaid grade level, SMOG Index, SARI, and BERTScore, G-Eval, as well as with qualitative metric, more precisely 5-point Likert scales for simplicity, accuracy, completeness, brevity. Results showed a superior performance of gpt-4o-mini and an underperformance of FT approaches. G-Eval, a LLM based quantitative metric, showed promising results, ranking the approaches similarly as the qualitative metric.

Country of Origin
🇸🇮 Slovenia

Page Count
5 pages

Category
Computer Science:
Computation and Language