Aims: Artificial intelligence’s integration into pathology has accelerated with the adoption of digital workflows. Large language models like ChatGPT offer unique opportunities but have yet to be systematically evaluated in diagnostic image interpretation. Methods: In this comparative study, 24 histopathological images representing various tissue types and pathological entities were evaluated by ChatGPT-4o mini and 15 experienced pathologists. The model was prompted with a standard diagnostic query without access to clinical information. Pathologists independently assessed the same images. Responses were categorized as correct, false positive, false negative, low-impact error, or no interpretation. Standard diagnostic metrics were calculated, and group comparisons were conducted using McNemar’s test and Fisher’s exact test. Interobserver agreement among pathologists was analyzed using Fleiss’ kappa. Results: ChatGPT-4o mini achieved an accuracy of 71.4%, with a sensitivity of 60.0% and a specificity of 77.8%. The average accuracy of pathologists was 89.8%, with 97.7% sensitivity and 87.1% specificity. Low-impact errors were more frequent with ChatGPT-4o mini (33.3%) compared to pathologists (6.9%). McNemar’s test revealed a statistically significant difference in favor of pathologists. The interobserver agreement among pathologists was in the lower range. Conclusion: While ChatGPT-4o mini demonstrated partial diagnostic capabilities, it underperformed compared to experienced pathologists. The absence of a clinical context likely impacted the results. Future artificial intelligence models integrating image analysis and clinical data may enhance performance. Despite limitations, the potential ChatGPT holds as a supportive diagnostic tool in pathology is highlighted in this study.
Building similarity graph...
Analyzing shared references across papers
Loading...
Aghajan Musali
Jamal Musayev
TURKISH MEDICAL STUDENT JOURNAL
SHILAP Revista de lepidopterología
Building similarity graph...
Analyzing shared references across papers
Loading...
Musali et al. (Fri,) studied this question.
synapsesocial.com/papers/69a528b3f1e85e5c73bf0452 — DOI: https://doi.org/10.4274/tmsj.galenos.2026.2025-10-1