Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
By: Khizar Anjum , Muhammad Arbab Arshad , Kadhim Hayawi and more
Potential Business Impact:
Organizes AI tests for different subjects.
Large language models (LLMs) are increasingly being deployed across disciplines due to their advanced reasoning and problem solving capabilities. To measure their effectiveness, various benchmarks have been developed that measure aspects of LLM reasoning, comprehension, and problem-solving. While several surveys address LLM evaluation and benchmarks, a domain-specific analysis remains underexplored in the literature. This paper introduces a taxonomy of seven key disciplines, encompassing various domains and application areas where LLMs are extensively utilized. Additionally, we provide a comprehensive review of LLM benchmarks and survey papers within each domain, highlighting the unique capabilities of LLMs and the challenges faced in their application. Finally, we compile and categorize these benchmarks by domain to create an accessible resource for researchers, aiming to pave the way for advancements toward artificial general intelligence (AGI)
Similar Papers
A Survey of Large Language Models in Discipline-specific Research: Challenges, Methods and Opportunities
Computation and Language
Helps computers learn and work in many science fields.
A Survey on Large Language Model Benchmarks
Computation and Language
Tests AI language skills, finds flaws, suggests fixes.
Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models
Software Engineering
Makes computers better at understanding language and code.