What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

Using LLM to Identify Pillars of the Mind Within Physics Learning Materials

Key Points

Using large language models can help identify cognitive pillars in physics learning materials, enhancing educational frameworks.
The case study analyzed eight pages of materials aimed at 12- to 14-year-olds, comparing AI results with manual assessments.
MAXQDA AI Assist achieved the highest precision of 1.00, while both OpenAI models exhibited significant identification errors.
Future applications of this method may extend to analyzing students' written work and video activities for deeper educational insights.

Abstract

Artificial intelligence tools are quickly being applied in many areas of science, including learning sciences. Learning requires various types of thinking, sustained by distinct sets of neural networks in the brain. Labelling these systems gives us tools to manage them. This paper presents a pilot application of Large Language Models (LLMs) to physics textbook analysis, grounded in a well-developed neural network theory known as the Five Pillars of the Mind. The domain-specific networks, innate sense, and the five pillars provide a framework with which to examine how physics is learnt. For example, one can identify which pillars are active when discussing a physics concept. Identifying which pillars belong to which physics concept may be significantly influenced by the bias of the author and could be too time-consuming for longer, more complex texts involving physics concepts. Therefore, using LLMs to identify pillars could enhance the application of this framework to physics education. This article presents a case study in which we used selected Large Language Models to identify pillars within eight pages of learning material concerning forces aimed at 12- to 14-year-old pupils. We used GPT-4o and o4-mini, as well as MAXQDA AI Assist. Results from these models were compared with the authors’ manual analysis. Precision, recall, and F1-Score were used to evaluate the results quantitatively. MAXQDA AI Assist obtained the best results with 1.00 precision, 0.67 recall, and an F1-Score of 0.80. Both products by OpenAI hallucinated and falsely identified several concepts, resulting in low precision and, consequently, low F1-Score. As predicted, ChatGPT o4-mini scored twice as high as ChatGPT 4o. The method proved to be promising, and its future development has the potential to provide research teams with analysis not only of written learning material, but also of pupils’ written work and their video-recorded activities.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper