Score: 0

RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

Published: October 7, 2025 | arXiv ID: 2510.06186v1

By: Chunyu Miao , Henry Peng Zou , Yangning Li and more

Potential Business Impact:

Helps AI write better science code with feedback.

Business Areas:

Simulation Software

Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research development. To address this gap, we present RECODE-H, a benchmark of 102 tasks from research papers and repositories that evaluates LLM agents through multi-turn interactions with LLM-simulated human feedback. It includes structured instructions,unit tests, and a five-level feedback hierarchy to reflect realistic researcher-agent collaboration. We further present ReCodeAgent, a framework that integrates feedback into iterative code generation. Experiments with leading LLMs, including GPT-5, Claude-Sonnet-4, DeepSeek-V3.1, and Gemini 2.5, show substantial performance gains with richer feedback, while also highlighting ongoing challenges in the generation of complex research code. RECODE-H establishes a foundation for developing adaptive, feedback-driven LLM agents in scientific research implementation

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Software Engineering

Helps computers write computer programs from words.

23 Nov 2025 2

90%

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Software Engineering

Makes computers write computer programs from your words.

23 Nov 2025 2

90%

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Software Engineering

Helps computers write computer programs from descriptions.

23 Nov 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

29 pages

RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

Helps AI write better science code with feedback.

Technical Abstract

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence