Almost Clinical: Linguistic properties of synthetic electronic health records
By: Serge Sharoff , John Baker , David Francis Hunt and more
Potential Business Impact:
Makes fake doctor notes that sound real.
This study evaluates the linguistic and clinical suitability of synthetic electronic health records (EHRs) in the field of mental health. First, we describe the rationale and the methodology for creating the synthetic corpus. Second, we assess agency, modality, and information flow across four clinical genres (Assessments, Correspondence, Referrals and Care plans) to understand how LLMs grammatically construct medical authority and patient agency through linguistic choices. While LLMs produce coherent, terminology-appropriate texts that approximate clinical practice, systematic divergences remain, including registerial shifts, insufficient clinical specificity, and inaccuracies in medication use and diagnostic procedures.
Similar Papers
A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs
Computation and Language
Makes fake health records work at any hospital.
Large Language Models are Powerful Electronic Health Record Encoders
Machine Learning (CS)
Helps doctors predict health problems using plain text.
Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support
Information Retrieval
Helps doctors understand patient notes better.