Score: 0

The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture

Published: January 6, 2026 | arXiv ID: 2601.02830v1

By: Feiyan Liu , Chenxun Zhuo , Siyan Zhao and more

Potential Business Impact:

Chinese AI knows Chinese culture better than US AI.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Cultural backgrounds shape individuals' perspectives and approaches to problem-solving. Since the emergence of GPT-1 in 2018, large language models (LLMs) have undergone rapid development. To date, the world's ten leading LLM developers are primarily based in China and the United States. To examine whether LLMs released by Chinese and U.S. developers exhibit cultural differences in Chinese-language settings, we evaluate their performance on questions about Chinese culture. This study adopts a direct-questioning paradigm to evaluate models such as GPT-5.1, DeepSeek-V3.2, Qwen3-Max, and Gemini2.5Pro. We assess their understanding of traditional Chinese culture, including history, literature, poetry, and related domains. Comparative analyses between LLMs developed in China and the U.S. indicate that Chinese models generally outperform their U.S. counterparts on these tasks. Among U.S.-developed models, Gemini 2.5Pro and GPT-5.1 achieve relatively higher accuracy. The observed performance differences may potentially arise from variations in training data distribution, localization strategies, and the degree of emphasis on Chinese cultural content during model development.

Page Count
7 pages

Category
Computer Science:
Computation and Language