Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation
By: Hung-Shin Lee , Chen-Chi Chang , Ching-Yuan Chen and more
Potential Business Impact:
Tests if computers understand different cultures.
This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.
Similar Papers
From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge
Computation and Language
Helps computers understand Bengali culture better.
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Computation and Language
Tests how AI uses outside facts to answer questions.
EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education
Computation and Language
Tests AI for schoolwork, finds strengths and weaknesses.