Mathematics and Coding are Universal AI Benchmarks
By: Przemyslaw Chojecki
We study the special role of mathematics and coding inside the moduli space of psychometric batteries for AI agents. Building on the AAI framework and GVU dynamics from previous works, we define the Mathematics Fiber and show that, when paired with formal proof kernels (e.g. Lean, Coq), GVU flows on this fiber admit spectrally stable self-improvement regimes due to oracle-like verification. Our main technical result is a density theorem: under uniform tightness of agent outputs and a Lipschitz AAI functional, the subspace of batteries generated by mathematical theorem-proving and coding tasks is dense in the moduli space of batteries with respect to the evaluation metric. Coding alone is universal in this sense, while pure mathematics is not; its privilege is spectral rather than expressive. We interpret this as evidence that mathematics and coding provide ``universal coordinates'' for evaluation, and that formal mathematics is a natural ignition domain for recursive self-improvement in advanced AI agents.
Similar Papers
The Geometry of Benchmarks: A New Path Toward AGI
Artificial Intelligence
Helps AI learn and improve itself better.
Self-Improving AI Agents through Self-Play
Artificial Intelligence
Makes AI systems learn and improve themselves.
Psychometric Tests for AI Agents and Their Moduli Space
Artificial Intelligence
Tests AI intelligence and finds its core abilities.