Assessment of Evolving Large Language Models in Upper Secondary Mathematics
By: Mika Setälä , Pieta Sikström , Ville Heilala and more
Potential Business Impact:
Computers now solve math problems like top students.
Large language models (LLMs) have shown increasing promise in educational settings, yet their mathematical reasoning has been considered evolving. This study evaluates the mathematical capabilities of various LLMs using the Finnish matriculation examination, a high-stakes digital test for upper secondary education. Initial tests yielded moderate performance corresponding to mid-range grades, but later evaluations demonstrated substantial improvements as the language models evolved. Remarkably, some models achieved near-perfect or perfect scores, matching top student performance and qualifying for university admission. Our findings highlight the rapid advances in the mathematical proficiency of LLMs and illustrate their potential as underlying tools to support learning and teaching in a variety of ways.
Similar Papers
A Report on the llms evaluating the high school questions
Computation and Language
Helps computers answer hard science questions.
Multilingual Performance Biases of Large Language Models in Education
Computation and Language
Tests if computers help students learn other languages.
Beyond Final Answers: Evaluating Large Language Models for Math Tutoring
Human-Computer Interaction
Helps computers teach math, but they make mistakes.