Score: 2

ACE-RLHF: Automated Code Evaluation and Socratic Feedback Generation Tool using Large Language Models and Reinforcement Learning with Human Feedback

Published: April 7, 2025 | arXiv ID: 2504.04657v1

By: Tasnia Rahman , Sathish A. P. Kumar , Sumit Jha and more

Potential Business Impact:

Fixes computer code errors with smart questions.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Automated Program Repair tools are developed for generating feedback and suggesting a repair method for erroneous code. State of the art (SOTA) code repair methods rely on data-driven approaches and often fail to deliver solution for complicated programming questions. To interpret the natural language of unprecedented programming problems, using Large Language Models (LLMs) for code-feedback generation is crucial. LLMs generate more comprehensible feedback than compiler-generated error messages, and Reinforcement Learning with Human Feedback (RLHF) further enhances quality by integrating human-in-the-loop which helps novice students to lean programming from scratch interactively. We are applying RLHF fine-tuning technique for an expected Socratic response such as a question with hint to solve the programming issue. We are proposing code feedback generation tool by fine-tuning LLM with RLHF, Automated Code Evaluation with RLHF (ACE-RLHF), combining two open-source LLM models with two different SOTA optimization techniques. The quality of feedback is evaluated on two benchmark datasets containing basic and competition-level programming questions where the later is proposed by us. We achieved 2-5% higher accuracy than RL-free SOTA techniques using Llama-3-7B-Proximal-policy optimization in automated evaluation and similar or slightly higher accuracy compared to reward model-free RL with AI Feedback (RLAIF). We achieved almost 40% higher accuracy with GPT-3.5 Best-of-n optimization while performing manual evaluation.

Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models

Artificial Intelligence

Helps computers write code faster and better.

19 Mar 2025 1

90%

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

Machine Learning (Stat)

Makes AI understand what people want better.

3 Apr 2025 1

89%

ACE-RL: Adaptive Constraint-Enhanced Reward for Long-form Generation Reinforcement Learning

Computation and Language

Helps computers write better, longer stories.

5 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com github.com

Page Count

9 pages

ACE-RLHF: Automated Code Evaluation and Socratic Feedback Generation Tool using Large Language Models and Reinforcement Learning with Human Feedback

Fixes computer code errors with smart questions.

Technical Abstract

Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

ACE-RL: Adaptive Constraint-Enhanced Reward for Long-form Generation Reinforcement Learning