Board Game Arena: A Framework and Benchmark for Assessing Large Language Models via Strategic Play
By: Lucia Cipolina-Kun, Marianna Nezhurina, Jenia Jitsev
Potential Business Impact:
Tests AI smarts with board games.
The Board Game Arena library provides a framework for evaluating the decision making abilities of large language models (LLMs) through strategic board games implemented in Google OpenSpiel library. The framework enables systematic comparisons between LLM based agents and other agents (random, human, reinforcement learning agents, etc.) in various game scenarios by wrapping multiple board and matrix games and supporting different agent types. It integrates API access to models via LiteLLM, local model deployment via vLLM, and offers distributed execution through Ray. Additionally it provides extensive analysis tools for the LLM reasoning traces. This paper summarizes the structure, key characteristics, and motivation of the repository, highlighting how it contributes to the empirical evaluation of the reasoning of LLM and game-theoretic behavior
Similar Papers
Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilities of Large Language Models via Game Play
Artificial Intelligence
Tests how smart AI plays games.
Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilites of Large Language Models via Game Play
Artificial Intelligence
Tests how smart AI plays games to improve it.
Who is a Better Player: LLM against LLM
Artificial Intelligence
Tests AI's smartness by playing board games.