Building Effective Safety Guardrails in AI Education Tools
By: Hannah-Beth Clark , Laura Benton , Emma Searle and more
Potential Business Impact:
Makes AI lesson plans safe for kids.
There has been rapid development in generative AI tools across the education sector, which in turn is leading to increased adoption by teachers. However, this raises concerns regarding the safety and age-appropriateness of the AI-generated content that is being created for use in classrooms. This paper explores Oak National Academy's approach to addressing these concerns within the development of the UK Government's first publicly available generative AI tool - our AI-powered lesson planning assistant (Aila). Aila is intended to support teachers planning national curriculum-aligned lessons that are appropriate for pupils aged 5-16 years. To mitigate safety risks associated with AI-generated content we have implemented four key safety guardrails - (1) prompt engineering to ensure AI outputs are generated within pedagogically sound and curriculum-aligned parameters, (2) input threat detection to mitigate attacks, (3) an Independent Asynchronous Content Moderation Agent (IACMA) to assess outputs against predefined safety categories, and (4) taking a human-in-the-loop approach, to encourage teachers to review generated content before it is used in the classroom. Through our on-going evaluation of these safety guardrails we have identified several challenges and opportunities to take into account when implementing and testing safety guardrails. This paper highlights ways to build more effective safety guardrails in generative AI education tools including the on-going iteration and refinement of guardrails, as well as enabling cross-sector collaboration through sharing both open-source code, datasets and learnings.
Similar Papers
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
Artificial Intelligence
AI learns to fix its own mistakes before harm.
Exploring Student Behaviors and Motivations using AI TAs with Optional Guardrails
Human-Computer Interaction
Helps AI tutors give better, not just answers.
Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources
Computers and Society
Helps teachers plan lessons faster and better.