Reasoning models are the new generation of Large Language Models (LLMs) capable of complex problem solving. Their reliability in solving introductory physics problems was tested by evaluating a sample of n = 5 solutions generated by one such model -- OpenAI's o3-mini -- per each problem from 20 chapters of a standard undergraduate textbook. In total, N = 408 problems were given to the model and N x n = 2,040 generated solutions examined. The model successfully solved 94% of the problems posed, excelling at the beginning topics in mechanics but struggling with the later ones such as waves and thermodynamics.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bralin et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68d6e1248b2b6861e4c3f802 — DOI: https://doi.org/10.48550/arxiv.2508.20941
Amir Bralin
N. Sanjay Rebello
Building similarity graph...
Analyzing shared references across papers
Loading...