Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants
By: Shi Ding, Brian Magerko
Potential Business Impact:
Guides making AI helpful and fair for schools.
As generative artificial intelligence (AI) continues to transform education, most existing AI evaluations rely primarily on technical performance metrics such as accuracy or task efficiency while overlooking human identity, learner agency, contextual learning processes, and ethical considerations. In this paper, we present TEACH-AI (Trustworthy and Effective AI Classroom Heuristics), a domain-independent, pedagogically grounded, and stakeholder-aligned framework with measurable indicators and a practical toolkit for guiding the design, development, and evaluation of generative AI systems in educational contexts. Built on an extensive literature review and synthesis, the ten-component assessment framework and toolkit checklist provide a foundation for scalable, value-aligned AI evaluation in education. TEACH-AI rethinks "evaluation" through sociotechnical, educational, theoretical, and applied lenses, engaging designers, developers, researchers, and policymakers across AI and education. Our work invites the community to reconsider what constructs "effective" AI in education and to design model evaluation approaches that promote co-creation, inclusivity, and long-term human, social, and educational impact.
Similar Papers
Pedagogy-driven Evaluation of Generative AI-powered Intelligent Tutoring Systems
Computation and Language
Creates fair tests for AI tutors.
A principled way to think about AI in education: guidance for action based on goals, models of human learning, and use of technologies
Computers and Society
Guides schools to use AI to help students learn.
Enhancing AI-Driven Education: Integrating Cognitive Frameworks, Linguistic Feedback Analysis, and Ethical Considerations for Improved Content Generation
Computation and Language
Makes AI learning tools smarter and safer.