Gaming the Arena: AI Model Evaluation and the Viral Capture of Attention
By: Sam Hind
Potential Business Impact:
AI models now battle each other to improve.
Innovation in artificial intelligence (AI) has always been dependent on technological infrastructures, from code repositories to computing hardware. Yet industry -- rather than universities -- has become increasingly influential in shaping AI innovation. As generative forms of AI powered by large language models (LLMs) have driven the breakout of AI into the wider world, the AI community has sought to develop new methods for independently evaluating the performance of AI models. How best, in other words, to compare the performance of AI models against other AI models -- and how best to account for new models launched on nearly a daily basis? Building on recent work in media studies, STS, and computer science on benchmarking and the practices of AI evaluation, I examine the rise of so-called 'arenas' in which AI models are evaluated with reference to gladiatorial-style 'battles'. Through a technography of a leading user-driven AI model evaluation platform, LMArena, I consider five themes central to the emerging 'arena-ization' of AI innovation. Accordingly, I argue that the arena-ization is being powered by a 'viral' desire to capture attention both in, and outside of, the AI community, critical to the scaling and commercialization of AI products. In the discussion, I reflect on the implications of 'arena gaming', a phenomenon through which model developers hope to capture attention.
Similar Papers
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Artificial Intelligence
Tests AI by seeing how people like its answers.
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Artificial Intelligence
Tests AI by seeing how people like its answers.
Board Game Arena: A Framework and Benchmark for Assessing Large Language Models via Strategic Play
Artificial Intelligence
Tests AI smarts with board games.