Score: 0

CAPE: Capability Achievement via Policy Execution

Published: December 15, 2025 | arXiv ID: 2512.14761v1

By: David Ball

Modern AI systems lack a way to express and enforce requirements. Pre-training produces intelligence, and post-training optimizes preferences, but neither guarantees that models reliably satisfy explicit, context-dependent constraints. This missing abstraction explains why highly intelligent models routinely fail in deployment despite strong benchmark performance. We introduce Capability Engineering, the systematic practice of converting requirements into executable specifications and training models to satisfy them by default. We operationalize this practice through CAPE (Capability Achievement via Policy Execution), a protocol implementing a Specify -> Verify -> Correct -> Train loop. CAPE is grounded in two empirical findings: (1) contextual objectivity, where properties appearing subjective become objective once context is fixed (inter-annotator agreement rises from kappa = 0.42 to kappa = 0.98), and (2) verification-fidelity scaling, where verification accuracy improves with model scale (r = 0.94), unlike preference agreement which plateaus at 30 to 50 percent disagreement regardless of compute. Across 109,500 examples in six domains, CAPE reduces violation rates by 81 percent relative to DPO (standard deviation less than 0.3 percent). By replacing per-example annotation with reusable specifications, CAPE reduces costs by 5 to 20 times and shortens timelines from months to weeks. We release the CAPE protocol, PredicateGraph schema, CPL specification language, and policy packs under Apache 2.0. We also launch CapabilityBench, a public registry of model evaluations against community-contributed policies, shifting evaluation from intelligence benchmarks toward capability measurement.

Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?

Artificial Intelligence

Tests AI for dangers like losing control.

7 Aug 2025 0

86%

From Educational Analytics to AI Governance: Transferable Lessons from Complex Systems Interventions

Computers and Society

Helps AI rules work better by studying how things change.

15 Dec 2025 0

85%

CAPIRE Intervention Lab: An Agent-Based Policy Simulation Environment for Curriculum-Constrained Engineering Programmes

Computers and Society

Tests teaching ideas to help students stay in school.

22 Nov 2025 0

View PDF Login to Bookmark

CAPE: Capability Achievement via Policy Execution

Technical Abstract

Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?

From Educational Analytics to AI Governance: Transferable Lessons from Complex Systems Interventions

CAPIRE Intervention Lab: An Agent-Based Policy Simulation Environment for Curriculum-Constrained Engineering Programmes