What is the clinical evidence from this study?

Study design: Review. Population: Surgical risk prediction. Intervention: Machine learning and artificial intelligence models.

What question did this study set out to answer?

This review examines algorithmic bias in surgical risk prediction models and its implications for patient safety and equity.

June 24, 2026Open Access

Algorithmic bias in surgical risk prediction models and its impact on patient safety: a review

Key Result

Algorithmic bias in surgical risk prediction models, driven by unrepresentative training data and the complex role of race as a variable, risks perpetuating existing disparities in patient safety and care equity.

Key Points

This review examines algorithmic bias in surgical risk prediction models and its implications for patient safety and equity.
Evaluated existing literature on algorithmic bias in machine learning-based surgical risk prediction models.
Discussed differential performance across racial and ethnic groups using surgical registry data.
Assessed debiasing strategies and the need for regulatory frameworks.
Identified lower sensitivity and higher false-negative rates in specific subgroups, indicating potential underestimation of surgical risk.
Highlighted the controversy around using race as a predictive variable due to existing disparities in care.
Called for multicenter validation studies and transparent reporting to enhance fairness in surgical prediction models.

Structured PICO

Population

Surgical risk prediction models and their application to diverse patient groups

Intervention

Machine learning and artificial intelligence in surgical risk prediction

Outcome

Algorithmic bias, fairness, equity, and patient safety

This review highlights the need for multicenter, demographically diverse validation studies and equity-focused governance to ensure AI in surgery serves all patients safely and equitably.

Limitations

Training datasets from major surgical registries frequently contain incomplete or poorly granular race and ethnicity data.
Debiasing techniques remain largely untested in surgical contexts and are constrained by inherent mathematical trade-offs.
Standardized auditing frameworks for demographic fairness in surgical prediction models are still lacking.
Complex ensemble methods and deep neural networks often function as opaque decision systems, complicating the identification of bias.

Abstract

Abstract The increasing adoption of machine learning and artificial intelligence in surgical risk prediction has introduced new challenges related to the fairness and equity of these algorithms. These models range from regression-based risk calculators to machine learning systems, and differential performance may reflect poor calibration within a group, which is mainly a safety concern, or unequal performance between groups, which is mainly an equity concern. Predictive models trained on surgical registry data have shown differential performance across racial and ethnic groups in some studies, raising concerns about the potential for these tools to perpetuate or amplify existing disparities in surgical care. This review examines the sources and mechanisms of algorithmic bias in surgical risk prediction models, evaluates the evidence for differential model performance across patient subgroups, and discusses emerging debiasing strategies and regulatory frameworks. Training datasets from major surgical registries frequently contain incomplete or poorly granular race and ethnicity data, and the inclusion of race as a predictive variable remains controversial. Individual studies have reported lower sensitivity or higher false-negative rates for specific subgroups, potentially leading to an underestimation of surgical risk in those groups, although such discrimination-based measures do not by themselves establish miscalibration or demonstrated harm. Debiasing techniques, including reweighting, adversarial training, and fairness-aware multitask learning, have shown promise but remain largely untested in surgical contexts and are constrained by inherent trade-offs between within-group calibration and error rate parity across groups. Although regulatory bodies have begun to address algorithmic fairness, standardized auditing frameworks for surgical prediction models are still lacking. This review highlights the need for multicenter, demographically diverse validation studies, transparent model reporting, and equity-focused governance to ensure that artificial intelligence in surgery serves all patients safely and equitably.

Ask AI

Helpful

Bookmark

View Full Paper