From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
By: Chengliang Zhou , Mei Wang , Ting Zhang and more
Potential Business Impact:
Helps computers create good school test questions.
Large Language Models (LLMs) have demonstrated remarkable capabilities in mathematical problem-solving. However, the transition from providing answers to generating high-quality educational questions presents significant challenges that remain underexplored. To advance Educational Question Generation (EQG) and facilitate LLMs in generating pedagogically valuable and educationally effective questions, we introduce EQGBench, a comprehensive benchmark specifically designed for evaluating LLMs' performance in Chinese EQG. EQGBench establishes a five-dimensional evaluation framework supported by a dataset of 900 evaluation samples spanning three fundamental middle school disciplines: mathematics, physics, and chemistry. The dataset incorporates user queries with varying knowledge points, difficulty gradients, and question type specifications to simulate realistic educational scenarios. Through systematic evaluation of 46 mainstream large models, we reveal significant room for development in generating questions that reflect educational value and foster students' comprehensive abilities.
Similar Papers
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
Computation and Language
Tests how well AI learns and thinks like students.
EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education
Computation and Language
Tests AI for schoolwork, finds strengths and weaknesses.
QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry
Artificial Intelligence
Tests if computers can do math for chemistry.