PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education
By: Megha Mariam K. M , Aditya Arun , Zakaria Laskar and more
Potential Business Impact:
AI makes physics videos for learning.
Generative AI models, particularly Text-to-Video (T2V) systems, offer a promising avenue for transforming science education by automating the creation of engaging and intuitive visual explanations. In this work, we take a first step toward evaluating their potential in physics education by introducing a dedicated benchmark for explanatory video generation. The benchmark is designed to assess how well T2V models can convey core physics concepts through visual illustrations. Each physics concept in our benchmark is decomposed into granular teaching points, with each point accompanied by a carefully crafted prompt intended for visual explanation of the teaching point. T2V models are evaluated on their ability to generate accurate videos in response to these prompts. Our aim is to systematically explore the feasibility of using T2V models to generate high-quality, curriculum-aligned educational content-paving the way toward scalable, accessible, and personalized learning experiences powered by AI. Our evaluation reveals that current models produce visually coherent videos with smooth motion and minimal flickering, yet their conceptual accuracy is less reliable. Performance in areas such as mechanics, fluids, and optics is encouraging, but models struggle with electromagnetism and thermodynamics, where abstract interactions are harder to depict. These findings underscore the gap between visual quality and conceptual correctness in educational video generation. We hope this benchmark helps the community close that gap and move toward T2V systems that can deliver accurate, curriculum-aligned physics content at scale. The benchmark and accompanying codebase are publicly available at https://github.com/meghamariamkm/PhyEduVideo.
Similar Papers
T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
CV and Pattern Recognition
Tests if AI videos understand how the world works.
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
Machine Learning (CS)
Makes computer videos follow real-world physics rules.
VideoVerse: How Far is Your T2V Generator from a World Model?
CV and Pattern Recognition
Tests if AI can make videos that make sense.