The aim of this study is to evaluate the potential, reliability, and limitations of ChatGPT-4o in text-based questions and its effectiveness in clinical decision support processes based on the 5th edition of the BI-RADS Atlas and ACR breast cancer screening guidelines. In this study, a total of 100 questions-50 multiple-choice and 50 true/false-prepared by two radiologists were submitted to ChatGPT-4o between November 5 and 19. The answers provided by ChatGPT-4o were evaluated at baseline and 14 days later by both radiologists for accuracy and comprehensiveness using a Likert scale. Group comparisons were performed using Mann-Whitney U, Wilcoxon tests; response consistency was evaluated with Cohen's Kappa, and overall accuracy differences with a two-proportion z-test. The increase in overall accuracy from 86 to 95% was statistically significant according to the two-proportion z-test (p = .030). Comparisons between the two sessions revealed statistically significant increases in the accuracy (p = .013, r = .35, 95% CI 0.09, 0.61) and comprehensiveness (p = .014, r = .35, 95% CI 0.09, 0.61) rates of true/false questions. On the other hand, no significant difference was found between the accuracy (p = .180, r = .19, 95% CI - 0.09, 0.47) and comprehensiveness (p = .180, r = .19, 95% CI - 0.09, 0.47) rates of multiple-choice questions. In addition, group comparisons evaluating the effect of different question formats on performance revealed no significant difference in terms of accuracy (p = .661, r = - 0.04, 95% CI - 0.23, 0.16) and comprehensiveness (p = .708, r = - 0.04, 95% CI - 0.23, 0.16). The consistency of ChatGPT-4o responses was supported by Cohen's Kappa coefficients, all statistically significant (p < .001), with 95% confidence intervals ranging from - .038 to 1.084. ChatGPT-4o demonstrated remarkable performance in answering multiple-choice and true-false questions with overall accuracy improving from 86% in the first test to 95% after 14 days. ChatGPT-4o holds significant potential as a clinical decision support tool for healthcare professionals.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bilgen Mehpare Özer
Emrullah Korkmaz
Ministry of Health
Building similarity graph...
Analyzing shared references across papers
Loading...
Özer et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68c93fe601120bef803baf24 — DOI: https://doi.org/10.1007/s10278-025-01663-8