Score: 1

EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits

Published: November 6, 2025 | arXiv ID: 2511.04486v1

By: Wayne Chi , Valerie Chen , Ryan Shar and more

Potential Business Impact:

Helps AI fix computer code like a person.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Instructed code editing, where LLMs directly modify a developer's existing code based on a user instruction, is becoming a widely used interaction mode in AI coding assistants. However, few benchmarks directly evaluate this capability and current datasets often rely on artificial sources. We introduce EDIT-Bench, a benchmark for evaluating LLM code editing capabilities grounded in real-world usage, i.e., user instructions and code contexts collected in the wild. EDIT-Bench comprises of 545 problems, multiple natural and programming languages, and a diverse set of real-world use cases, ranging from resolving errors to adding features. EDIT-Bench introduces context-dependent problems that require the model to understand code context, highlighted code, and cursor position in addition to the user instruction. We evaluate 40 diverse LLMs and observe that EDIT-Bench is a challenging set of problems where only 5 models score over 60%. We find that model performance varies across different categories of user instructions. Further, we find that varying levels of contextual information greatly affect task success rate, with performance varying up to 11%, indicating the importance of evaluating with realistic context.

EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits

Software Engineering

Tests AI that fixes computer code from instructions.

6 Nov 2025 1

90%

Envisioning Future Interactive Web Development: Editing Webpage with Natural Language

Software Engineering

Lets computers change website designs by talking.

30 Oct 2025 1

89%

Understanding Robustness of Model Editing in Code LLMs: An Empirical Study

Software Engineering

Makes computer code programs work better.

5 Nov 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com github.com

Page Count

30 pages

EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits

Helps AI fix computer code like a person.

Technical Abstract

EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits

Envisioning Future Interactive Web Development: Editing Webpage with Natural Language

Understanding Robustness of Model Editing in Code LLMs: An Empirical Study