ConvoLearn: A Dataset of Constructivist Tutor-Student Dialogue
By: Mayank Sharma, Roy Pea, Hari Subramonyam
In educational applications, LLMs exhibit several fundamental pedagogical limitations, such as their tendency to reveal solutions rather than support dialogic learning. We introduce ConvoLearn (https://huggingface.co/datasets/masharma/convolearn ), a dataset grounded in knowledge building theory that operationalizes six core pedagogical dimensions: cognitive engagement, formative assessment, accountability, cultural responsiveness, metacognition, and power dynamics. We construct a semi-synthetic dataset of 1250 tutor-student dialogues (20 turns each) in middle school Earth Science through controlled interactions between human teachers and a simulated student. Using QLoRA, we demonstrate that training on this dataset meaningfully shifts LLM behavior toward knowledge-building strategies. Human evaluation by 31 teachers shows our fine-tuned Mistral 7B (M = 4.10, SD = 1.03) significantly outperforms both its base version (M = 2.59, SD = 1.11) and Claude Sonnet 4.5 (M = 2.87, SD = 1.29) overall. This work establishes a potential framework to guide future development and evaluation of constructivist AI tutors.
Similar Papers
Exploring Conversational Design Choices in LLMs for Pedagogical Purposes: Socratic and Narrative Approaches for Improving Instructor's Teaching Practice
Human-Computer Interaction
Helps teachers learn to use AI better.
The StudyChat Dataset: Student Dialogues With ChatGPT in an Artificial Intelligence Course
Artificial Intelligence
Helps teachers see how students use AI.
EduDial: Constructing a Large-scale Multi-turn Teacher-Student Dialogue Corpus
Computation and Language
Teaches computers to be better student tutors.