Artificial intelligence (AI) is increasingly used in clinical decision-making to improve diagnostic accuracy, predictive performance, and treatment planning across multiple specialties. This systematic review evaluated the accuracy and clinical outcomes of AI-based systems compared with standard clinical practice. A comprehensive literature search was conducted in PubMed, Embase, Scopus, and Cochrane Library following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, and six studies with a combined sample size of approximately 1.1 million patients and imaging datasets were included. Due to substantial heterogeneity in study populations, AI models, clinical settings, and outcome measures, a narrative synthesis was performed, and risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) and Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tools. Overall, AI models demonstrated strong performance with AUC values ranging from 0.85 to 0.96, sensitivity up to 97%, and specificity up to 93%, particularly in radiology and dermatology, where performance was comparable or superior to that of clinicians. However, ICU-based predictive models showed more variability. In conclusion, AI demonstrates promising diagnostic and predictive accuracy, although the evidence is predominantly derived from retrospective studies requiring prospective validation, highlighting the need for prospective multicentre trials before routine clinical implementation.
Chari et al. (Tue,) studied this question.