What type of study is this?

This is a Quantitative Study study.

September 24, 2025Open Access

Comparison of GPT-5 Responses With the Official Results of the Polish Specialized Psychiatric Examination in Child and Adolescent Psychiatry

Key Points

GPT-5 accurately answered 97 questions, achieving 80.8% correctness in the Polish National Specialty Examination.
Statistical analysis indicated higher confidence ratings significantly correlated with correct answers (p = 0.012).
No significant differences in accuracy were observed between clinical and theoretical questions (p = 0.399).
AI's potential as a supplementary tool in medical education necessitates oversight from experts and further research.

Abstract

Introduction Artificial intelligence (AI), particularly language models such as ChatGPT (OpenAI, San Francisco, CA, USA), is becoming increasingly important in medical education and knowledge assessment. Prior studies have demonstrated the growing effectiveness of AI in preparing students for medical examinations, including the Medical Final Examination (Lekarski Egzamin Końcowy (LEK)) of Poland and the National Specialty Examination across various disciplines. This raises important questions regarding its potential role as a tool to support specialist training. Objective The aim of this study is to evaluate the effectiveness of the advanced GPT-5 model in addressing problems in child and adolescent psychiatry. The focus is on the accuracy of answers, their correctness, and the model's self-declared confidence levels to assess its potential efficacy in education. Methodology The study analyzed the official spring 2025 National Specialty Examination (Państwowy Egzamin Specjalizacyjny (PES)) of Poland in child and adolescent psychiatry. The exam consisted of 120 multiple-choice questions with a single correct answer. GPT-5 was familiarized with the examination rules and then presented with the questions in the Polish language. Answers were evaluated using the official Centre for Medical Examination (CEM) key. In addition, the model provided a confidence rating for each answer on a five-point scale. Questions were categorized as either clinical or theoretical. Statistical analysis was conducted using the chi-square test and the Mann-Whitney U test. Results GPT-5 answered 97 questions correctly (80.8%), surpassing the required passing threshold. No significant difference was observed between the accuracy of responses to clinical versus theoretical questions (p = 0.399). However, correct answers were significantly more likely when the model reported higher confidence levels (p = 0.012). Conclusions GPT-5 demonstrated strong performance in the National Specialty Examination of Poland in child and adolescent psychiatry, supporting its potential as a supplementary tool in specialist education. Confidence ratings may provide an additional metric for evaluating the reliability of answers. Nevertheless, broader integration of AI in medical education requires experts overseeing the process and further research across diverse medical disciplines.

Perguntar à IA

Bookmark

View Full Paper

Cite This Study

Kowalczyk et al. (Mon,) studied this question.

synapsesocial.com/papers/68d6d8ba8b2b6861e4c3f01e https://doi.org/https://doi.org/10.7759/cureus.92982

Perguntar à IA

Bookmark

View Full Paper