Score: 1

Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions

Published: October 15, 2025 | arXiv ID: 2510.13931v1

By: Siying Liu, Shisheng Zhang, Indu Bala

Potential Business Impact:

AI unfairly predicts drug side effects for some.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) are increasingly applied in biomedical domains, yet their reliability in drug-safety prediction remains underexplored. In this work, we investigate whether LLMs incorporate socio-demographic information into adverse event (AE) predictions, despite such attributes being clinically irrelevant. Using structured data from the United States Food and Drug Administration Adverse Event Reporting System (FAERS) and a persona-based evaluation framework, we assess two state-of-the-art models, ChatGPT-4o and Bio-Medical-Llama-3.8B, across diverse personas defined by education, marital status, employment, insurance, language, housing stability, and religion. We further evaluate performance across three user roles (general practitioner, specialist, patient) to reflect real-world deployment scenarios where commercial systems often differentiate access by user type. Our results reveal systematic disparities in AE prediction accuracy. Disadvantaged groups (e.g., low education, unstable housing) were frequently assigned higher predicted AE likelihoods than more privileged groups (e.g., postgraduate-educated, privately insured). Beyond outcome disparities, we identify two distinct modes of bias: explicit bias, where incorrect predictions directly reference persona attributes in reasoning traces, and implicit bias, where predictions are inconsistent, yet personas are not explicitly mentioned. These findings expose critical risks in applying LLMs to pharmacovigilance and highlight the urgent need for fairness-aware evaluation protocols and mitigation strategies before clinical deployment.

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

Computation and Language

AI can create fake news tailored to you.

14 Oct 2025 2

90%

Evaluating the Clinical Safety of LLMs in Response to High-Risk Mental Health Disclosures

Computers and Society

AI helps people in mental health crises.

1 Sep 2025 0

90%

RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation

Artificial Intelligence

Tests AI to help doctors give safe medicine.

6 Nov 2025 2

View PDF Login to Bookmark

Country of Origin

🇦🇺 Australia

Page Count

12 pages

Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions

AI unfairly predicts drug side effects for some.

Technical Abstract

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

Evaluating the Clinical Safety of LLMs in Response to High-Risk Mental Health Disclosures

RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation