An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction
By: Ali Hamdi , Malak Mohamed , Rokaia Emad and more
Potential Business Impact:
Helps doctors find diseases from online patient messages.
Social telehealth has made remarkable progress in healthcare by allowing patients to post symptoms and participate in medical consultations remotely. Users frequently post symptoms on social media and online health platforms, creating a huge repository of medical data that can be leveraged for disease classification. Large language models (LLMs) such as LLAMA3 and GPT-3.5, along with transformer-based models like BERT, have demonstrated strong capabilities in processing complex medical text. In this study, we evaluate three Arabic medical text preprocessing methods such as summarization, refinement, and Named Entity Recognition (NER) before applying fine-tuned Arabic transformer models (CAMeLBERT, AraBERT, and AsafayaBERT). To enhance robustness, we adopt a majority voting ensemble that combines predictions from original and preprocessed text representations. This approach achieved the best classification accuracy of 80.56%, thus showing its effectiveness in leveraging various text representations and model predictions to improve the understanding of medical texts. To the best of our knowledge, this is the first work that integrates LLM-based preprocessing with fine-tuned Arabic transformer models and ensemble learning for disease classification in Arabic social telehealth data.
Similar Papers
Arabic Large Language Models for Medical Text Generation
Computation and Language
Helps doctors give better advice in Arabic.
Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks
Computation and Language
Helps computers understand Arabic medical questions.
Multi-Label Clinical Text Eligibility Classification and Summarization System
Computation and Language
Helps doctors find the right patients for studies.