Score: 0

Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams

Published: November 6, 2025 | arXiv ID: 2511.04213v1

By: Markus Herklotz, Niklas Ippisch, Anna-Carolina Haensch

Potential Business Impact:

Helps students learn with personalized computer feedback.

Business Areas:

Test and Measurement Data and Analytics

One of the central challenges for instructors is offering meaningful individual feedback, especially in large courses. Faced with limited time and resources, educators are often forced to rely on generalized feedback, even when more personalized support would be pedagogically valuable. To overcome this limitation, one potential technical solution is to utilize large language models (LLMs). For an exploratory study using a new platform connected with LLMs, we conducted a LLM-corrected mock exam during the "Introduction to Statistics" lecture at the University of Munich (Germany). The online platform allows instructors to upload exercises along with the correct solutions. Students complete these exercises and receive overall feedback on their results, as well as individualized feedback generated by GPT-4 based on the correct answers provided by the lecturers. The resulting dataset comprised task-level information for all participating students, including individual responses and the corresponding LLM-generated feedback. Our systematic analysis revealed that approximately 7 \% of the 2,389 feedback instances contained errors, ranging from minor technical inaccuracies to conceptually misleading explanations. Further, using a combined feedback framework approach, we found that the feedback predominantly focused on explaining why an answer was correct or incorrect, with fewer instances providing deeper conceptual insights, learning strategies or self-regulatory advice. These findings highlight both the potential and the limitations of deploying LLMs as scalable feedback tools in higher education, emphasizing the need for careful quality monitoring and prompt design to maximize their pedagogical value.

Beyond Correctness: Evaluating and Improving LLM Feedback in Statistical Education

Other Statistics

Helps teachers give better feedback to students.

10 Nov 2025 1

91%

Dean of LLM Tutors: Exploring Comprehensive and Automated Evaluation of LLM-generated Educational Feedback via LLM Feedback Evaluators

Computers and Society

Checks AI teacher's answers for students.

8 Aug 2025 0

90%

Personalized and Constructive Feedback for Computer Science Students Using the Large Language Model (LLM)

Computers and Society

Gives students personalized feedback to learn better.

13 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

32 pages

Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams

Helps students learn with personalized computer feedback.

Technical Abstract

Beyond Correctness: Evaluating and Improving LLM Feedback in Statistical Education

Dean of LLM Tutors: Exploring Comprehensive and Automated Evaluation of LLM-generated Educational Feedback via LLM Feedback Evaluators

Personalized and Constructive Feedback for Computer Science Students Using the Large Language Model (LLM)