Score: 0

Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

Published: October 28, 2025 | arXiv ID: 2510.24236v1

By: Teague McMillan , Gabriele Dominici , Martin Gjoreski and more

Potential Business Impact:

Makes AI give honest reasons for its answers.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) often produce explanations that do not faithfully reflect the factors driving their predictions. In healthcare settings, such unfaithfulness is especially problematic: explanations that omit salient clinical cues or mask spurious shortcuts can undermine clinician trust and lead to unsafe decision support. We study how inference and training-time choices shape explanation faithfulness, focusing on factors practitioners can control at deployment. We evaluate three LLMs (GPT-4.1-mini, LLaMA 70B, LLaMA 8B) on two datasets-BBQ (social bias) and MedQA (medical licensing questions), and manipulate the number and type of few-shot examples, prompting strategies, and training procedure. Our results show: (i) both the quantity and quality of few-shot examples significantly impact model faithfulness; (ii) faithfulness is sensitive to prompting design; (iii) the instruction-tuning phase improves measured faithfulness on MedQA. These findings offer insights into strategies for enhancing the interpretability and trustworthiness of LLMs in sensitive domains.

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

Computation and Language

Checks if AI's answers are honest.

19 Apr 2025 3

91%

Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian

Computation and Language

Makes AI explain its thoughts more honestly.

24 Nov 2025 1

90%

Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations

Computation and Language

Checks if AI explanations for X-rays are truthful.

13 Oct 2025 3

View PDF Login to Bookmark

Country of Origin

🇨🇭 Switzerland

Page Count

14 pages

Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

Makes AI give honest reasons for its answers.

Technical Abstract

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian

Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations