Score: 1

Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models

Published: March 19, 2025 | arXiv ID: 2503.15129v1

By: Man Fai Wong, Chee Wei Tan

Potential Business Impact:

Helps computers write code faster and better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper studies how AI-assisted programming and large language models (LLM) improve software developers' ability via AI tools (LLM agents) like Github Copilot and Amazon CodeWhisperer, while integrating human feedback to enhance reinforcement learning (RLHF) with crowd-sourced computation to enhance text-to-code generation. Additionally, we demonstrate that our Bayesian optimization framework supports AI alignment in code generation by distributing the feedback collection burden, highlighting the value of collecting human feedback of good quality. Our empirical evaluations demonstrate the efficacy of this approach, showcasing how LLM agents can be effectively trained for improved text-to-code generation. Our Bayesian optimization framework can be designed for general domain-specific languages, promoting the alignment of large language model capabilities with human feedback in AI-assisted programming for code generation.

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

Machine Learning (Stat)

Makes AI understand what people want better.

3 Apr 2025 1

90%

Clone-Robust AI Alignment

Machine Learning (CS)

Teaches AI to learn better from human choices.

16 Jan 2025 0

90%

ACE-RLHF: Automated Code Evaluation and Socratic Feedback Generation Tool using Large Language Models and Reinforcement Learning with Human Feedback

Machine Learning (CS)

Fixes computer code errors with smart questions.

7 Apr 2025 2

View PDF Login to Bookmark

Country of Origin

🇭🇰 🇸🇬 Hong Kong, Singapore

Page Count

12 pages

Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models

Helps computers write code faster and better.

Technical Abstract

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

Clone-Robust AI Alignment

ACE-RLHF: Automated Code Evaluation and Socratic Feedback Generation Tool using Large Language Models and Reinforcement Learning with Human Feedback