December 3, 2025Open Access

Comparative performance of ChatGPT-4o, ChatGPT-5, and gemini 2.5 flash on Persian internal medicine subspecialty board exams

Key Points

Gemini 2.5 Flash achieved the highest accuracy of 79.9% in Persian internal medicine subspecialty board exams, highlighting its effectiveness.
ChatGPT-5 outperformed ChatGPT-4o with a significant accuracy increase of 74.5% compared to 68.9%, confirming improvements in model development.
An artificial neural network combining capabilities of all models reached 81.6% accuracy, suggesting that integrating models can enhance performance.
Results emphasize the potential role of AI in medical education and clinical practice but call for further research in practical applications.

Abstract

This study compared the performance of ChatGPT-4o, ChatGPT-5, and Gemini 2.5 Flash on the 2025 Iranian internal medicine subspecialty board examinations. A total of 650 multiple-choice questions from six subspecialties were tested, excluding image-based items. Each question was presented in Persian, and responses were evaluated against the official answer key. Accuracy rates were 68.9% for ChatGPT-4o, 74.5% for ChatGPT-5, and 79.9% for Gemini 2.5 Flash, with Gemini performing significantly better than both ChatGPT versions. ChatGPT-5 also showed a significant improvement over ChatGPT-4o, confirming rapid progress in model development. Subspecialty analysis revealed stronger results in rheumatology and respiratory medicine compared to nephrology, while question type and length had no significant impact on outcomes. An artificial neural network that combined the outputs of all three models reached 81.6% accuracy, slightly exceeding Gemini alone. These findings highlight Gemini-2.5 as the most reliable model for this high-stakes internal medicine exam. The results support the growing role of advanced AI systems as assistants in medical education and clinical practice. However, further research is needed to assess their use in multimodal and real-world clinical tasks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Shahab Sheikhalishahi

Shahid Sadoughi University of Medical Sciences and Health Services

Alireza Haddadi

Saina Sadeghipour

Shahid Sadoughi University of Medical Sciences and Health Services

Journals

Scientific Reports

Actions

Institutions

Shahid Sadoughi University of Medical Sciences and Health Services

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Comparative performance of ChatGPT-4o, ChatGPT-5, and gemini 2.5 flash on Persian internal medicine subspecialty board exams

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study