Score: 2

Generating Planning Feedback for Open-Ended Programming Exercises with LLMs

Published: April 11, 2025 | arXiv ID: 2504.08958v1

By: Mehmet Arif Demirtaş , Claire Zheng , Max Fowler and more

Potential Business Impact:

Helps teachers grade code, even with mistakes.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

To complete an open-ended programming exercise, students need to both plan a high-level solution and implement it using the appropriate syntax. However, these problems are often autograded on the correctness of the final submission through test cases, and students cannot get feedback on their planning process. Large language models (LLM) may be able to generate this feedback by detecting the overall code structure even for submissions with syntax errors. To this end, we propose an approach that detects which high-level goals and patterns (i.e. programming plans) exist in a student program with LLMs. We show that both the full GPT-4o model and a small variant (GPT-4o-mini) can detect these plans with remarkable accuracy, outperforming baselines inspired by conventional approaches to code analysis. We further show that the smaller, cost-effective variant (GPT-4o-mini) achieves results on par with state-of-the-art (GPT-4o) after fine-tuning, creating promising implications for smaller models for real-time grading. These smaller models can be incorporated into autograders for open-ended code-writing exercises to provide feedback for students' implicit planning skills, even when their program is syntactically incorrect. Furthermore, LLMs may be useful in providing feedback for problems in other domains where students start with a set of high-level solution steps and iteratively compute the output, such as math and physics problems.

Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving

Software Engineering

AI helps teachers grade student code better.

18 Mar 2025 0

89%

LLM-as-a-Grader: Practical Insights from Large Language Model for Short-Answer and Report Evaluation

Computation and Language

Computer grades student work like a teacher.

13 Nov 2025 1

89%

Open, Small, Rigmarole -- Evaluating Llama 3.2 3B's Feedback for Programming Exercises

Computers and Society

Helps small AI give good feedback on student code.

1 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

17 pages

Generating Planning Feedback for Open-Ended Programming Exercises with LLMs

Helps teachers grade code, even with mistakes.

Technical Abstract

Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving

LLM-as-a-Grader: Practical Insights from Large Language Model for Short-Answer and Report Evaluation

Open, Small, Rigmarole -- Evaluating Llama 3.2 3B's Feedback for Programming Exercises