Key points are not available for this paper at this time.
Abstract Recent advancements in large language models (LLMs) have shown potential in enhancing educational practices, particularly in technology-assisted learning environments. This study critically evaluates the reasoning capabilities of LLMs, such as ChatGPT, within the context of chemistry education. We designed targeted adversarial prompts that challenge the models to solve complex chemistry problems and assessed their performance. By pushing the boundaries of LLM reasoning, we aim to identify their limitations and strengths in handling queries within the chemistry domain. Our findings expose inherent weaknesses in current AI systems, emphasizing the necessity of cautious AI deployment in teaching methodologies. We argue for a balanced approach, leveraging the benefits of LLMs while mitigating their limitations, to facilitate their seamless adoption in education.
Uçar et al. (Fri,) studied this question.