Everyday Physics in Korean Contexts: A Culturally Grounded Physical Reasoning Benchmark
By: Jihae Jeong , DaeYeop Lee , DongGeon Lee and more
Potential Business Impact:
Teaches computers about Korean physics and culture.
Existing physical commonsense reasoning benchmarks predominantly focus on Western contexts, overlooking cultural variations in physical problem-solving. To address this gap, we introduce EPiK (Everyday Physics in Korean Contexts), a novel benchmark comprising 181 binary-choice problems that test physical reasoning within Korean cultural contexts, ranging from kimchi (Korean food) to traditional fermentation. EPiK is constructed using a two-stage generation and verification pipeline to create culturally-authentic problems across 9 reasoning subtasks and 84 scenarios. Unlike approaches based on simple translation, our method generates problems organically from Korean contexts while upholding rigorous physical reasoning standards. Our evaluations show that Korean-specialized models consistently outperform general-purpose models of comparable size. This performance gap highlights the limitations of culturally-agnostic models and demonstrates the critical need for culturally-aware benchmarks to truly measure language understanding. Our EPiK is publicly available at https://huggingface.co/datasets/jjae/EPiK.
Similar Papers
Ko-PIQA: A Korean Physical Commonsense Reasoning Dataset with Cultural Context
Computation and Language
Helps computers understand Korean culture and common sense.
Ko-PIQA: A Korean Physical Commonsense Reasoning Dataset with Cultural Context
Computation and Language
Teaches computers Korean culture for better understanding.
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Computation and Language
Tests how well AI understands hard science problems.