Agents of Discovery
By: Sascha Diefenbacher , Anna Hallin , Gregor Kasieczka and more
Potential Business Impact:
Lets computers analyze science data like people.
The substantial data volumes encountered in modern particle physics and other domains of fundamental physics research allow (and require) the use of increasingly complex data analysis tools and workflows. While the use of machine learning (ML) tools for data analysis has recently proliferated, these tools are typically special-purpose algorithms that rely, for example, on encoded physics knowledge to reach optimal performance. In this work, we investigate a new and orthogonal direction: Using recent progress in large language models (LLMs) to create a team of agents -- instances of LLMs with specific subtasks -- that jointly solve data analysis-based research problems in a way similar to how a human researcher might: by creating code to operate standard tools and libraries (including ML systems) and by building on results of previous iterations. If successful, such agent-based systems could be deployed to automate routine analysis components to counteract the increasing complexity of modern tool chains. To investigate the capabilities of current-generation commercial LLMs, we consider the task of anomaly detection via the publicly available and highly-studied LHC Olympics dataset. Several current models by OpenAI (GPT-4o, o4-mini, GPT-4.1, and GPT-5) are investigated and their stability tested. Overall, we observe the capacity of the agent-based system to solve this data analysis problem. The best agent-created solutions mirror the performance of human state-of-the-art results.
Similar Papers
Automating High Energy Physics Data Analysis with LLM-Powered Agents
Data Analysis, Statistics and Probability
Computers automatically analyze science data.
Large Language Model-based Data Science Agent: A Survey
Artificial Intelligence
Lets computers help scientists analyze data.
Can Theoretical Physics Research Benefit from Language Agents?
Computation and Language
Helps scientists discover new physics faster.