Zero-shot Performance of Generative AI in Brazilian Portuguese Medical Exam
By: Cesar Augusto Madid Truyts , Amanda Gomes Rabelo , Gabriel Mesquita de Souza and more
Potential Business Impact:
AI helps doctors in Brazil understand medical questions.
Artificial intelligence (AI) has shown the potential to revolutionize healthcare by improving diagnostic accuracy, optimizing workflows, and personalizing treatment plans. Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have achieved notable advancements in natural language processing and medical applications. However, the evaluation of these models has focused predominantly on the English language, leading to potential biases in their performance across different languages. This study investigates the capability of six LLMs (GPT-4.0 Turbo, LLaMA-3-8B, LLaMA-3-70B, Mixtral 8x7B Instruct, Titan Text G1-Express, and Command R+) and four MLLMs (Claude-3.5-Sonnet, Claude-3-Opus, Claude-3-Sonnet, and Claude-3-Haiku) to answer questions written in Brazilian spoken portuguese from the medical residency entrance exam of the Hospital das Cl\'inicas da Faculdade de Medicina da Universidade de S\~ao Paulo (HCFMUSP) - the largest health complex in South America. The performance of the models was benchmarked against human candidates, analyzing accuracy, processing time, and coherence of the generated explanations. The results show that while some models, particularly Claude-3.5-Sonnet and Claude-3-Opus, achieved accuracy levels comparable to human candidates, performance gaps persist, particularly in multimodal questions requiring image interpretation. Furthermore, the study highlights language disparities, emphasizing the need for further fine-tuning and data set augmentation for non-English medical AI applications. Our findings reinforce the importance of evaluating generative AI in various linguistic and clinical settings to ensure a fair and reliable deployment in healthcare. Future research should explore improved training methodologies, improved multimodal reasoning, and real-world clinical integration of AI-driven medical assistance.
Similar Papers
Performance of Large Language Models in Supporting Medical Diagnosis and Treatment
Computation and Language
AI helps doctors diagnose illnesses and plan treatments.
Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks
Computation and Language
New AI helps doctors more than old AI.
Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks
Computation and Language
Helps computers understand Arabic medical questions.