Score: 1

Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models

Published: August 12, 2025 | arXiv ID: 2508.08879v1

By: Haeun Yu , Seogyeong Jeong , Siddhesh Pawar and more

Potential Business Impact:

Shows how computers misunderstand different cultures.

The growing deployment of large language models (LLMs) across diverse cultural contexts necessitates a better understanding of how the overgeneralization of less documented cultures within LLMs' representations impacts their cultural understanding. Prior work only performs extrinsic evaluation of LLMs' cultural competence, without accounting for how LLMs' internal mechanisms lead to cultural (mis)representation. To bridge this gap, we propose Culturescope, the first mechanistic interpretability-based method that probes the internal representations of LLMs to elicit the underlying cultural knowledge space. CultureScope utilizes a patching method to extract the cultural knowledge. We introduce a cultural flattening score as a measure of the intrinsic cultural biases. Additionally, we study how LLMs internalize Western-dominance bias and cultural flattening, which allows us to trace how cultural biases emerge within LLMs. Our experimental results reveal that LLMs encode Western-dominance bias and cultural flattening in their cultural knowledge space. We find that low-resource cultures are less susceptible to cultural biases, likely due to their limited training resources. Our work provides a foundation for future research on mitigating cultural biases and enhancing LLMs' cultural understanding. Our codes and data used for experiments are publicly available.

Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective

Computation and Language

Fixes computer "thinking" to be less unfair.

5 Jun 2025 0

90%

IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context

Computation and Language

Finds and fixes unfairness in AI language.

3 Oct 2025 2

90%

Language over Content: Tracing Cultural Understanding in Multilingual Large Language Models

Computation and Language

Shows how computers understand different cultures.

18 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇩🇰 Denmark

Repos / Data Links

github.com github.com

Page Count

16 pages

Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models

Shows how computers misunderstand different cultures.

Technical Abstract

Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective

IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context

Language over Content: Tracing Cultural Understanding in Multilingual Large Language Models