ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding
By: Tuan-Dung Le, Shohreh Haddadan, Thanh Q. Thieu
Potential Business Impact:
Helps doctors quickly label patient illnesses and treatments.
Automatic ICD coding, the task of assigning disease and procedure codes to electronic medical records, is crucial for clinical documentation and billing. While existing methods primarily enhance model understanding of code hierarchies and synonyms, they often overlook the pervasive use of medical acronyms in clinical notes, a key factor in ICD code inference. To address this gap, we propose a novel effective data augmentation technique that leverages large language models to expand medical acronyms, allowing models to be trained on their full form representations. Moreover, we incorporate consistency training to regularize predictions by enforcing agreement between the original and augmented documents. Extensive experiments on the MIMIC-III dataset demonstrate that our approach, ACE-ICD establishes new state-of-the-art performance across multiple settings, including common codes, rare codes, and full-code assignments. Our code is publicly available.
Similar Papers
Synthetic Clinical Notes for Rare ICD Codes: A Data-Centric Framework for Long-Tail Medical Coding
Computation and Language
Teaches computers to find rare diseases in notes.
DACE For Railway Acronym Disambiguation
Computation and Language
Helps computers understand train jargon better.
LTR-ICD: A Learning-to-Rank Approach for Automatic ICD Coding
Machine Learning (CS)
Helps doctors sort patient sickness codes faster.