Score: 0

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

Published: March 12, 2025 | arXiv ID: 2503.13503v3

By: Chuan Qin , Xin Chen , Chengrui Wang and more

Potential Business Impact:

Checks if AI can help science discover new things.

Business Areas:

Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective on data quality and model capability. Therefore, in this study, we propose SciHorizon, a comprehensive assessment framework designed to benchmark the readiness of AI4Science from both scientific data and LLM perspectives. First, we introduce a generalizable framework for assessing AI-ready scientific data, encompassing four key dimensions: Quality, FAIRness, Explainability, and Compliance-which are subdivided into 15 sub-dimensions. Drawing on data resource papers published between 2018 and 2023 in peer-reviewed journals, we present recommendation lists of AI-ready datasets for Earth, Life, and Materials Sciences, making a novel and original contribution to the field. Concurrently, to assess the capabilities of LLMs across multiple scientific disciplines, we establish 16 assessment dimensions based on five core indicators Knowledge, Understanding, Reasoning, Multimodality, and Values spanning Mathematics, Physics, Chemistry, Life Sciences, and Earth and Space Sciences. Using the developed benchmark datasets, we have conducted a comprehensive evaluation of over 50 representative open-source and closed source LLMs. All the results are publicly available and can be accessed online at www.scihorizon.cn/en.

Evaluating Large Language Models in Scientific Discovery

Artificial Intelligence

Tests if AI can do real science experiments.

17 Dec 2025 2

89%

HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery

Artificial Intelligence

Tests computers on science from reading to discovery.

28 Dec 2025 0

89%

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

Computation and Language

AI helps scientists discover new things faster.

7 Feb 2025 2

View PDF Login to Bookmark

Page Count

36 pages

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

Checks if AI can help science discover new things.

Technical Abstract

Evaluating Large Language Models in Scientific Discovery

HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation