Score: 1

Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring

Published: May 13, 2025 | arXiv ID: 2505.08351v2

By: Mina Almasi, Ross Deans Kristensen-McLachlan

Potential Business Impact:

Teaches you a new language, but not perfectly.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning. In particular, we evaluate whether system prompting can reliably constrain LLMs to generate only text appropriate to the student's competence level. We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters. Dialogues are generated by having an LLM alternate between tutor and student roles with separate chat histories. The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels (A1, B1, C1). Our findings suggest that while system prompting can be used to constrain model outputs, prompting alone is too brittle for sustained, long-term interactional contexts - a phenomenon we term alignment drift. Our results provide insights into the feasibility of LLMs for personalized, proficiency-aligned adaptive tutors and provide a scalable method for low-cost evaluation of model performance without human participants.