What question did this study set out to answer?

This study aims to compare the predictive performance of machine learning models with established manual triage systems in emergency departments.

April 26, 2026Open Access

Comparative performance of manual triage systems and ML models in predicting clinical outcomes among patients transported to the emergency department by emergency medical services

Key Points

This study aims to compare the predictive performance of machine learning models with established manual triage systems in emergency departments.
Prospective single-center study involving 899 adult patients transported by EMS.
Three ML models (Random Forest, Gradient Boosting Machine, CatBoost) were developed using clinical variables.
Performance was evaluated using AUROC, AUPRC, Brier score, and statistical comparisons using DeLong’s test.
Adverse clinical outcomes occurred in 9.1% of patients.
ML models demonstrated high performance (AUROC > 0.80) with comparable metrics to manual triage systems.
No statistically significant performance differences were observed (all p > 0.05).

Abstract

Accurate triage in emergency departments (EDs) is essential for timely care and efficient resource utilization, particularly in high-acuity settings. Machine learning (ML) models have the potential to enhance risk stratification; however, their performance relative to established manual triage systems remains uncertain. This prospective single-center study included 899 adult patients transported by emergency medical services (EMS) and triaged to the ED red zone. Three ML models—Random Forest, Gradient Boosting Machine, and CatBoost—were developed using routinely available clinical variables. The dataset was split into training (70%) and test (30%) sets. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC), Brier score, and classification metrics. DeLong’s test was used for statistical comparisons. Model interpretability was assessed using SHAP (Shapley Additive Explanations) analysis. Adverse clinical outcomes (intensive care unit admission or in-hospital mortality) occurred in 9.1% of patients. All manual triage systems (Emergency Severity Index, Manchester Triage System, Turkish Ministry of Health Triage System) were significantly associated with adverse outcomes (p 0.80) and showed comparable or numerically higher performance across multiple metrics, including AUPRC and Brier score. However, no statistically significant differences were observed between ML models and manual triage systems (all p > 0.05). ML models also demonstrated comparable or improved calibration, particularly in low-prevalence settings. SHAP analysis identified age and key physiological parameters as the most influential predictors. ML models demonstrated robust and consistent performance in predicting adverse outcomes among high-acuity EMS patients, with performance broadly comparable to established manual triage systems. These findings support the potential role of ML-based approaches as complementary decision support tools in ED triage, warranting further validation in multicenter settings.

Mark Helpful

Bookmark

Relay

View Full Paper