Score: 0

Zero-shot Performance of Generative AI in Brazilian Portuguese Medical Exam

Published: July 26, 2025 | arXiv ID: 2507.19885v1

By: Cesar Augusto Madid Truyts , Amanda Gomes Rabelo , Gabriel Mesquita de Souza and more

Potential Business Impact:

AI helps doctors in Brazil understand medical questions.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Artificial intelligence (AI) has shown the potential to revolutionize healthcare by improving diagnostic accuracy, optimizing workflows, and personalizing treatment plans. Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have achieved notable advancements in natural language processing and medical applications. However, the evaluation of these models has focused predominantly on the English language, leading to potential biases in their performance across different languages. This study investigates the capability of six LLMs (GPT-4.0 Turbo, LLaMA-3-8B, LLaMA-3-70B, Mixtral 8x7B Instruct, Titan Text G1-Express, and Command R+) and four MLLMs (Claude-3.5-Sonnet, Claude-3-Opus, Claude-3-Sonnet, and Claude-3-Haiku) to answer questions written in Brazilian spoken portuguese from the medical residency entrance exam of the Hospital das Cl\'inicas da Faculdade de Medicina da Universidade de S\~ao Paulo (HCFMUSP) - the largest health complex in South America. The performance of the models was benchmarked against human candidates, analyzing accuracy, processing time, and coherence of the generated explanations. The results show that while some models, particularly Claude-3.5-Sonnet and Claude-3-Opus, achieved accuracy levels comparable to human candidates, performance gaps persist, particularly in multimodal questions requiring image interpretation. Furthermore, the study highlights language disparities, emphasizing the need for further fine-tuning and data set augmentation for non-English medical AI applications. Our findings reinforce the importance of evaluating generative AI in various linguistic and clinical settings to ensure a fair and reliable deployment in healthcare. Future research should explore improved training methodologies, improved multimodal reasoning, and real-world clinical integration of AI-driven medical assistance.

Performance of Large Language Models in Supporting Medical Diagnosis and Treatment

Computation and Language

AI helps doctors diagnose illnesses and plan treatments.

14 Apr 2025 0

91%

Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks

Computation and Language

New AI helps doctors more than old AI.

1 Dec 2025 1

90%

Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks

Computation and Language

Helps computers understand Arabic medical questions.

13 Aug 2025 1

View PDF Login to Bookmark

Page Count

26 pages

Zero-shot Performance of Generative AI in Brazilian Portuguese Medical Exam

AI helps doctors in Brazil understand medical questions.

Technical Abstract

Performance of Large Language Models in Supporting Medical Diagnosis and Treatment

Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks

Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks