Score: 1

Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification

Published: March 6, 2025 | arXiv ID: 2503.04463v1

By: Van Bach Nguyen, Christin Seifert, Jörg Schlötterer

Potential Business Impact:

Makes AI explain its decisions with small changes.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The need for interpretability in deep learning has driven interest in counterfactual explanations, which identify minimal changes to an instance that change a model's prediction. Current counterfactual (CF) generation methods require task-specific fine-tuning and produce low-quality text. Large Language Models (LLMs), though effective for high-quality text generation, struggle with label-flipping counterfactuals (i.e., counterfactuals that change the prediction) without fine-tuning. We introduce two simple classifier-guided approaches to support counterfactual generation by LLMs, eliminating the need for fine-tuning while preserving the strengths of LLMs. Despite their simplicity, our methods outperform state-of-the-art counterfactual generation methods and are effective across different LLMs, highlighting the benefits of guiding counterfactual generation by LLMs with classifier information. We further show that data augmentation by our generated CFs can improve a classifier's robustness. Our analysis reveals a critical issue in counterfactual generation by LLMs: LLMs rely on parametric knowledge rather than faithfully following the classifier.

Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework

Machine Learning (CS)

Explains how smart programs learn new skills.

25 Sep 2025 0

90%

Counterfactual reasoning: an analysis of in-context emergence

Computation and Language

Helps computers guess what happens if things change.

5 Jun 2025 0

90%

LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations

Machine Learning (CS)

AI explanations can be wrong or misleading.

11 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

20 pages

Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification

Makes AI explain its decisions with small changes.

Technical Abstract

Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework

Counterfactual reasoning: an analysis of in-context emergence

LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations