Score: 1

OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding

Published: January 15, 2026 | arXiv ID: 2601.10343v2

By: Deming Ding , Shichun Liu , Enhui Yang and more

Potential Business Impact:

Helps AI agents follow complex coding rules.

Business Areas:

Simulation Software

Modern coding scaffolds turn LLMs into capable software agents, but their ability to follow scaffold-specified instructions remains under-examined, especially when constraints are heterogeneous and persist across interactions. To fill this gap, we introduce OctoBench, which benchmarks scaffold-aware instruction following in repository-grounded agentic coding. OctoBench includes 34 environments and 217 tasks instantiated under three scaffold types, and is paired with 7,098 objective checklist items. To disentangle solving the task from following the rules, we provide an automated observation-and-scoring toolkit that captures full trajectories and performs fine-grained checks. Experiments on eight representative models reveal a systematic gap between task-solving and scaffold-aware compliance, underscoring the need for training and evaluation that explicitly targets heterogeneous instruction following. We release the benchmark to support reproducible benchmarking and to accelerate the development of more scaffold-aware coding agents.

OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding

Computation and Language

Teaches computers to follow coding rules better.

15 Jan 2026 1

88%

CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments

Software Engineering

Tests if AI can write code correctly.

31 Oct 2025 1

87%

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Software Engineering

Tests AI's ability to build real computer programs.

16 Jan 2026 3

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

21 pages

OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding

Helps AI agents follow complex coding rules.

Technical Abstract

OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding

CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development