Score: 0

Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

Published: September 20, 2025 | arXiv ID: 2509.16778v1

By: Eason Chen , Sophia Judicke , Kayla Beigh and more

Potential Business Impact:

Helps students learn math better with AI.

Business Areas:
Machine Learning Artificial Intelligence, Data and Analytics, Software

We evaluate the effectiveness of LLM-Tutor, a large language model (LLM)-powered tutoring system that combines an AI-based proof-review tutor for real-time feedback on proof-writing and a chatbot for mathematics-related queries. Our experiment, involving 148 students, demonstrated that the use of LLM-Tutor significantly improved homework performance compared to a control group without access to the system. However, its impact on exam performance and time spent on tasks was found to be insignificant. Mediation analysis revealed that students with lower self-efficacy tended to use the chatbot more frequently, which partially contributed to lower midterm scores. Furthermore, students with lower self-efficacy were more likely to engage frequently with the proof-review-AI-tutor, a usage pattern that positively contributed to higher final exam scores. Interviews with 19 students highlighted the accessibility of LLM-Tutor and its effectiveness in addressing learning needs, while also revealing limitations and concerns regarding potential over-reliance on the tool. Our results suggest that generative AI alone like chatbot may not suffice for comprehensive learning support, underscoring the need for iterative design improvements with learning sciences principles with generative AI educational tools like LLM-Tutor.

Country of Origin
🇺🇸 United States

Page Count
20 pages

Category
Computer Science:
Human-Computer Interaction