Score: 0

Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models

Published: March 12, 2025 | arXiv ID: 2503.10690v1

By: Shahnewaz Karim Sakib, Anindya Bijoy Das, Shibbir Ahmed

Potential Business Impact:

Helps computers spot fake facts in questions.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Adversarial factuality refers to the deliberate insertion of misinformation into input prompts by an adversary, characterized by varying levels of expressed confidence. In this study, we systematically evaluate the performance of several open-source large language models (LLMs) when exposed to such adversarial inputs. Three tiers of adversarial confidence are considered: strongly confident, moderately confident, and limited confidence. Our analysis encompasses eight LLMs: LLaMA 3.1 (8B), Phi 3 (3.8B), Qwen 2.5 (7B), Deepseek-v2 (16B), Gemma2 (9B), Falcon (7B), Mistrallite (7B), and LLaVA (7B). Empirical results indicate that LLaMA 3.1 (8B) exhibits a robust capability in detecting adversarial inputs, whereas Falcon (7B) shows comparatively lower performance. Notably, for the majority of the models, detection success improves as the adversary's confidence decreases; however, this trend is reversed for LLaMA 3.1 (8B) and Phi 3 (3.8B), where a reduction in adversarial confidence corresponds with diminished detection performance. Further analysis of the queries that elicited the highest and lowest rates of successful attacks reveals that adversarial attacks are more effective when targeting less commonly referenced or obscure information.

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

Cryptography and Security

Makes AI chatbots less likely to lie.

8 Nov 2025 0

89%

An Empirical Analysis of LLMs for Countering Misinformation

Computation and Language

Helps computers spot fake news, but needs improvement.

28 Feb 2025 0

89%

On Fact and Frequency: LLM Responses to Misinformation Expressed with Uncertainty

Computation and Language

AI believes false things when said with doubt.

6 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

12 pages

Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models

Helps computers spot fake facts in questions.

Technical Abstract

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

An Empirical Analysis of LLMs for Countering Misinformation

On Fact and Frequency: LLM Responses to Misinformation Expressed with Uncertainty