Score: 1

Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning

Published: September 2, 2025 | arXiv ID: 2509.05346v1

By: Bo Yuan, Jiazi Hu

Potential Business Impact:

Helps AI tutors give better, personalized learning help.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

While Large Language Models (LLMs) are increasingly envisioned as intelligent assistants for personalized learning, systematic head-to-head evaluations within authentic learning scenarios remain limited. This study conducts an empirical comparison of three state-of-the-art LLMs on a tutoring task that simulates a realistic learning setting. Using a dataset comprising a student's answers to ten questions of mixed formats with correctness labels, each LLM is required to (i) analyze the quiz to identify underlying knowledge components, (ii) infer the student's mastery profile, and (iii) generate targeted guidance for improvement. To mitigate subjectivity and evaluator bias, we employ Gemini as a virtual judge to perform pairwise comparisons along various dimensions: accuracy, clarity, actionability, and appropriateness. Results analyzed via the Bradley-Terry model indicate that GPT-4o is generally preferred, producing feedback that is more informative and better structured than its counterparts, while DeepSeek-V3 and GLM-4.5 demonstrate intermittent strengths but lower consistency. These findings highlight the feasibility of deploying LLMs as advanced teaching assistants for individualized support and provide methodological guidance for future empirical research on LLM-driven personalized learning.

Large Language Models for Education and Research: An Empirical and User Survey-based Analysis

Artificial Intelligence

Helps students and researchers learn and solve problems.

8 Dec 2025 1

93%

Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving

Software Engineering

AI helps teachers grade student code better.

18 Mar 2025 0

93%

Evaluation of LLMs for mathematical problem solving

Artificial Intelligence

Computers solve harder math problems better.

30 May 2025 1

View PDF Login to Bookmark

Page Count

15 pages

Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning

Helps AI tutors give better, personalized learning help.

Technical Abstract

Large Language Models for Education and Research: An Empirical and User Survey-based Analysis

Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving

Evaluation of LLMs for mathematical problem solving