Score: 1

PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems

Published: July 31, 2025 | arXiv ID: 2508.00079v1

By: Oshayer Siddique , J. M Areeb Uzair Alam , Md Jobayer Rahman Rafy and more

Potential Business Impact:

Helps computers solve hard physics problems better.

The discipline of physics stands as a cornerstone of human intellect, driving the evolution of technology and deepening our understanding of the fundamental principles of the cosmos. Contemporary literature includes some works centered on the task of solving physics problems - a crucial domain of natural language reasoning. In this paper, we evaluate the performance of frontier LLMs in solving physics problems, both mathematical and descriptive. We also employ a plethora of inference-time techniques and agentic frameworks to improve the performance of the models. This includes the verification of proposed solutions in a cumulative fashion by other, smaller LLM agents, and we perform a comparative analysis of the performance that the techniques entail. There are significant improvements when the multi-agent framework is applied to problems that the models initially perform poorly on. Furthermore, we introduce a new evaluation benchmark for physics problems, ${\rm P{\small HYSICS}E{\small VAL}}$, consisting of 19,609 problems sourced from various physics textbooks and their corresponding correct solutions scraped from physics forums and educational websites. Our code and data are publicly available at https://github.com/areebuzair/PhysicsEval.

Can Theoretical Physics Research Benefit from Language Agents?

Computation and Language

Helps scientists discover new physics faster.

6 Jun 2025 0

90%

Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models

Machine Learning (CS)

Tests if computers understand how things move.

10 Sep 2025 1

90%

Advancing AI-Scientist Understanding: Multi-Agent LLMs with Interpretable Physics Reasoning

Artificial Intelligence

AI helps scientists solve physics problems better.

2 Apr 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

18 pages

PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems

Helps computers solve hard physics problems better.

Technical Abstract

Can Theoretical Physics Research Benefit from Language Agents?

Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models

Advancing AI-Scientist Understanding: Multi-Agent LLMs with Interpretable Physics Reasoning