LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing
By: Hao Li , Yiqun Zhang , Zhaoyan Guo and more
Potential Business Impact:
Chooses best AI for each question.
Large language model (LLM) routing assigns each query to the most suitable model from an ensemble. We introduce LLMRouterBench, a large-scale benchmark and unified framework for LLM routing. It comprises over 400K instances from 21 datasets and 33 models. Moreover, it provides comprehensive metrics for both performance-oriented routing and performance-cost trade-off routing, and integrates 10 representative routing baselines. Using LLMRouterBench, we systematically re-evaluate the field. While confirming strong model complementarity-the central premise of LLM routing-we find that many routing methods exhibit similar performance under unified evaluation, and several recent approaches, including commercial routers, fail to reliably outperform a simple baseline. Meanwhile, a substantial gap remains to the Oracle, driven primarily by persistent model-recall failures. We further show that backbone embedding models have limited impact, that larger ensembles exhibit diminishing returns compared to careful model curation, and that the benchmark also enables latency-aware analysis. All code and data are available at https://github.com/ynulihao/LLMRouterBench.
Similar Papers
VL-RouterBench: A Benchmark for Vision-Language Model Routing
Machine Learning (CS)
Helps AI choose the best way to answer questions.
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Computation and Language
Makes AI smarter by picking the best tool.
OmniRouter: Budget and Performance Controllable Multi-LLM Routing
Databases
Smarter AI chooses best tool, saves money.