Score: 0

Almost Clinical: Linguistic properties of synthetic electronic health records

Published: January 3, 2026 | arXiv ID: 2601.01171v1

By: Serge Sharoff , John Baker , David Francis Hunt and more

Potential Business Impact:

Makes fake doctor notes that sound real.

Business Areas:

Electronic Health Record (EHR) Health Care

This study evaluates the linguistic and clinical suitability of synthetic electronic health records (EHRs) in the field of mental health. First, we describe the rationale and the methodology for creating the synthetic corpus. Second, we assess agency, modality, and information flow across four clinical genres (Assessments, Correspondence, Referrals and Care plans) to understand how LLMs grammatically construct medical authority and patient agency through linguistic choices. While LLMs produce coherent, terminology-appropriate texts that approximate clinical practice, systematic divergences remain, including registerial shifts, insufficient clinical specificity, and inaccuracies in medication use and diagnostic procedures.