Score: 0

"The Dentist is an involved parent, the bartender is not": Revealing Implicit Biases in QA with Implicit BBQ

Published: December 7, 2025 | arXiv ID: 2512.06732v1

By: Aarushi Wagh, Saniya Srivastava

Potential Business Impact:

Finds hidden unfairness in AI language.

Business Areas:
Semantic Search Internet Services

Existing benchmarks evaluating biases in large language models (LLMs) primarily rely on explicit cues, declaring protected attributes like religion, race, gender by name. However, real-world interactions often contain implicit biases, inferred subtly through names, cultural cues, or traits. This critical oversight creates a significant blind spot in fairness evaluation. We introduce ImplicitBBQ, a benchmark extending the Bias Benchmark for QA (BBQ) with implicitly cued protected attributes across 6 categories. Our evaluation of GPT-4o on ImplicitBBQ illustrates troubling performance disparity from explicit BBQ prompts, with accuracy declining up to 7% in the "sexual orientation" subcategory and consistent decline located across most other categories. This indicates that current LLMs contain implicit biases undetected by explicit benchmarks. ImplicitBBQ offers a crucial tool for nuanced fairness evaluation in NLP.

Country of Origin
🇺🇸 United States

Page Count
6 pages

Category
Computer Science:
Computation and Language