Algorithmic bias in surgical risk prediction models, driven by unrepresentative training data and the complex role of race as a variable, risks perpetuating existing disparities in patient safety and care equity.
This review highlights the need for multicenter, demographically diverse validation studies and equity-focused governance to ensure AI in surgery serves all patients safely and equitably.
Abstract The increasing adoption of machine learning and artificial intelligence in surgical risk prediction has introduced new challenges related to the fairness and equity of these algorithms. These models range from regression-based risk calculators to machine learning systems, and differential performance may reflect poor calibration within a group, which is mainly a safety concern, or unequal performance between groups, which is mainly an equity concern. Predictive models trained on surgical registry data have shown differential performance across racial and ethnic groups in some studies, raising concerns about the potential for these tools to perpetuate or amplify existing disparities in surgical care. This review examines the sources and mechanisms of algorithmic bias in surgical risk prediction models, evaluates the evidence for differential model performance across patient subgroups, and discusses emerging debiasing strategies and regulatory frameworks. Training datasets from major surgical registries frequently contain incomplete or poorly granular race and ethnicity data, and the inclusion of race as a predictive variable remains controversial. Individual studies have reported lower sensitivity or higher false-negative rates for specific subgroups, potentially leading to an underestimation of surgical risk in those groups, although such discrimination-based measures do not by themselves establish miscalibration or demonstrated harm. Debiasing techniques, including reweighting, adversarial training, and fairness-aware multitask learning, have shown promise but remain largely untested in surgical contexts and are constrained by inherent trade-offs between within-group calibration and error rate parity across groups. Although regulatory bodies have begun to address algorithmic fairness, standardized auditing frameworks for surgical prediction models are still lacking. This review highlights the need for multicenter, demographically diverse validation studies, transparent model reporting, and equity-focused governance to ensure that artificial intelligence in surgery serves all patients safely and equitably.
Mohamed Mustaf Ahmed (Tue,) conducted a review in Surgical risk prediction. Machine learning and artificial intelligence models was evaluated. Algorithmic bias in surgical risk prediction models, driven by unrepresentative training data and the complex role of race as a variable, risks perpetuating existing disparities in patient safety and care equity.