Score: 0

KoSimpleQA: A Korean Factuality Benchmark with an Analysis of Reasoning LLMs

Published: October 21, 2025 | arXiv ID: 2510.18368v1

By: Donghyeon Ko , Yeguk Jin , Kyubyung Chae and more

Potential Business Impact:

Tests if AI knows Korean facts correctly.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We present $\textbf{Korean SimpleQA (KoSimpleQA)}$, a benchmark for evaluating factuality in large language models (LLMs) with a focus on Korean cultural knowledge. KoSimpleQA is designed to be challenging yet easy to grade, consisting of 1,000 short, fact-seeking questions with unambiguous answers. We conduct a comprehensive evaluation across a diverse set of open-source LLMs of varying sizes that support Korean, and find that even the strongest model generates correct answer only 33.7% of the time, underscoring the challenging nature of KoSimpleQA. Notably, performance rankings on KoSimpleQA differ substantially from those on the English SimpleQA, highlighting the unique value of our dataset. Furthermore, our analysis of reasoning LLMs shows that engaging reasoning capabilities in the factual QA task can both help models better elicit their latent knowledge and improve their ability to abstain when uncertain. KoSimpleQA can be found at https://anonymous.4open.science/r/KoSimpleQA-62EB.

Country of Origin
🇰🇷 Korea, Republic of

Page Count
6 pages

Category
Computer Science:
Computation and Language