Score: 0

Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference

Published: March 6, 2025 | arXiv ID: 2503.05047v1

By: Grace Proebsting, Adam Poliak

Potential Business Impact:

Finds hidden bias in AI-written text.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We test whether NLP datasets created with Large Language Models (LLMs) contain annotation artifacts and social biases like NLP datasets elicited from crowd-source workers. We recreate a portion of the Stanford Natural Language Inference corpus using GPT-4, Llama-2 70b for Chat, and Mistral 7b Instruct. We train hypothesis-only classifiers to determine whether LLM-elicited NLI datasets contain annotation artifacts. Next, we use pointwise mutual information to identify the words in each dataset that are associated with gender, race, and age-related terms. On our LLM-generated NLI datasets, fine-tuned BERT hypothesis-only classifiers achieve between 86-96% accuracy. Our analyses further characterize the annotation artifacts and stereotypical biases in LLM-generated datasets.

Country of Origin
🇺🇸 United States

Page Count
16 pages

Category
Computer Science:
Computation and Language