Score: 0

Do Large Language Models Truly Understand Cross-cultural Differences?

Published: December 8, 2025 | arXiv ID: 2512.07075v1

By: Shiwei Guo , Sihang Jiang , Qianxi He and more

Potential Business Impact:

Tests if computers understand different cultures.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In recent years, large language models (LLMs) have demonstrated strong performance on multilingual tasks. Given its wide range of applications, cross-cultural understanding capability is a crucial competency. However, existing benchmarks for evaluating whether LLMs genuinely possess this capability suffer from three key limitations: a lack of contextual scenarios, insufficient cross-cultural concept mapping, and limited deep cultural reasoning capabilities. To address these gaps, we propose SAGE, a scenario-based benchmark built via cross-cultural core concept alignment and generative task design, to evaluate LLMs' cross-cultural understanding and reasoning. Grounded in cultural theory, we categorize cross-cultural capabilities into nine dimensions. Using this framework, we curated 210 core concepts and constructed 4530 test items across 15 specific real-world scenarios, organized under four broader categories of cross-cultural situations, following established item design principles. The SAGE dataset supports continuous expansion, and experiments confirm its transferability to other languages. It reveals model weaknesses across both dimensions and scenarios, exposing systematic limitations in cross-cultural reasoning. While progress has been made, LLMs are still some distance away from reaching a truly nuanced cross-cultural understanding. In compliance with the anonymity policy, we include data and code in the supplement materials. In future versions, we will make them publicly available online.

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Software Engineering

Makes computers better at understanding language and code.

4 Dec 2025 1

91%

Language over Content: Tracing Cultural Understanding in Multilingual Large Language Models

Computation and Language

Shows how computers understand different cultures.

18 Oct 2025 0

91%

Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?

Computation and Language

Computers don't truly understand stories, they just remember them.

5 Sep 2025 1

View PDF Login to Bookmark

Page Count

21 pages

Do Large Language Models Truly Understand Cross-cultural Differences?

Tests if computers understand different cultures.

Technical Abstract

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Language over Content: Tracing Cultural Understanding in Multilingual Large Language Models

Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?