The Geometry of Benchmarks: A New Path Toward AGI
By: Przemyslaw Chojecki
Potential Business Impact:
Helps AI learn and improve itself better.
Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance on batteries spanning families of tasks (for example reasoning, planning, tool use and long-horizon control). Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences. This geometry yields determinacy results: dense families of batteries suffice to certify performance on entire regions of task space. Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning as special cases, and we define a self-improvement coefficient $κ$ as the Lie derivative of a capability functional along the induced flow. A variance inequality on the combined noise of generation and verification provides sufficient conditions for $κ> 0$. Our results suggest that progress toward artificial general intelligence (AGI) is best understood as a flow on moduli of benchmarks, driven by GVU dynamics rather than by scores on individual leaderboards.
Similar Papers
Self-Improving AI Agents through Self-Play
Artificial Intelligence
Makes AI systems learn and improve themselves.
Psychometric Tests for AI Agents and Their Moduli Space
Artificial Intelligence
Tests AI intelligence and finds its core abilities.
An Operational Kardashev-Style Scale for Autonomous AI - Towards AGI and Superintelligence
Artificial Intelligence
Rates AI progress from simple robots to superintelligence.