Score: 0

Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation

Published: November 3, 2025 | arXiv ID: 2511.01649v1

By: Hung-Shin Lee , Chen-Chi Chang , Ching-Yuan Chen and more

Potential Business Impact:

Tests if computers understand different cultures.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.

From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge

Computation and Language

Helps computers understand Bengali culture better.

22 Oct 2025 0

90%

Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey

Computation and Language

Tests how AI uses outside facts to answer questions.

21 Apr 2025 0

90%

EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education

Computation and Language

Tests AI for schoolwork, finds strengths and weaknesses.

29 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇹🇼 Taiwan, Province of China

Page Count

31 pages

Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation

Tests if computers understand different cultures.

Technical Abstract

From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge

Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey

EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education