Introduction Artificial intelligence (AI), particularly language models such as ChatGPT (OpenAI, San Francisco, CA, USA), is becoming increasingly important in medical education and knowledge assessment. Prior studies have demonstrated the growing effectiveness of AI in preparing students for medical examinations, including the Medical Final Examination (Lekarski Egzamin Końcowy (LEK)) of Poland and the National Specialty Examination across various disciplines. This raises important questions regarding its potential role as a tool to support specialist training. Objective The aim of this study is to evaluate the effectiveness of the advanced GPT-5 model in addressing problems in child and adolescent psychiatry. The focus is on the accuracy of answers, their correctness, and the model's self-declared confidence levels to assess its potential efficacy in education. Methodology The study analyzed the official spring 2025 National Specialty Examination (Państwowy Egzamin Specjalizacyjny (PES)) of Poland in child and adolescent psychiatry. The exam consisted of 120 multiple-choice questions with a single correct answer. GPT-5 was familiarized with the examination rules and then presented with the questions in the Polish language. Answers were evaluated using the official Centre for Medical Examination (CEM) key. In addition, the model provided a confidence rating for each answer on a five-point scale. Questions were categorized as either clinical or theoretical. Statistical analysis was conducted using the chi-square test and the Mann-Whitney U test. Results GPT-5 answered 97 questions correctly (80.8%), surpassing the required passing threshold. No significant difference was observed between the accuracy of responses to clinical versus theoretical questions (p = 0.399). However, correct answers were significantly more likely when the model reported higher confidence levels (p = 0.012). Conclusions GPT-5 demonstrated strong performance in the National Specialty Examination of Poland in child and adolescent psychiatry, supporting its potential as a supplementary tool in specialist education. Confidence ratings may provide an additional metric for evaluating the reliability of answers. Nevertheless, broader integration of AI in medical education requires experts overseeing the process and further research across diverse medical disciplines.
Building similarity graph...
Analyzing shared references across papers
Loading...
Anna Kowalczyk
Michalina Loson-Kawalec
Ann Tabor
Cureus
Building similarity graph...
Analyzing shared references across papers
Loading...
Kowalczyk et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68d6d8ba8b2b6861e4c3f01e — DOI: https://doi.org/10.7759/cureus.92982
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: