Towards Valid Student Simulation with Large Language Models
By: Zhihao Yuan , Yunze Xiao , Ming Li and more
Potential Business Impact:
Teaches computers to act like students learning.
This paper presents a conceptual and methodological framework for large language model (LLM) based student simulation in educational settings. The authors identify a core failure mode, termed the "competence paradox" in which broadly capable LLMs are asked to emulate partially knowledgeable learners, leading to unrealistic error patterns and learning dynamics. To address this, the paper reframes student simulation as a constrained generation problem governed by an explicit Epistemic State Specification (ESS), which defines what a simulated learner can access, how errors are structured, and how learner state evolves over time. The work further introduces a Goal-by-Environment framework to situate simulated student systems according to behavioral objectives and deployment contexts. Rather than proposing a new system or benchmark, the paper synthesizes prior literature, formalizes key design dimensions, and articulates open challenges related to validity, evaluation, and ethical risks. Overall, the paper argues for epistemic fidelity over surface realism as a prerequisite for using LLM-based simulated students as reliable scientific and pedagogical instruments.
Similar Papers
Simulating Students with Large Language Models: A Review of Architecture, Mechanisms, and Role Modelling in Education with Generative AI
Computers and Society
Lets computers act like students to test teaching.
Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents
Machine Learning (CS)
Makes computer students learn like real kids.
Simulated Students in Tutoring Dialogues: Substance or Illusion?
Computation and Language
Makes AI tutors learn from fake students better.