What type of study is this?

September 10, 2025

Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features

Key Points

The Random Forest model achieved 89.44% accuracy in predicting COVID-19 mortality, demonstrating its capability in healthcare.
Feature selection identified key predictors like Age and Pneumonia, illuminating critical risk factors for fatality.
Sampling techniques, including SMOTE and RUS, addressed class imbalance in the COVID-19 dataset, enhancing model performance.
This research underscores the importance of using machine learning methods for informed clinical decision-making during pandemics.

Abstract

Identifying COVID-19 patients at high risk of fatality is critically important for healthcare professionals, as it supports informed decision-making and enhances the capacity to manage emerging crises within medical systems. Nevertheless, COVID-19 datasets are frequently highly imbalanced, with substantially fewer fatality cases presenting a challenge to the development of effective machine learning algorithms. This study aims to develop a high-performing machine learning approach to predict COVID-19 mortality using a Mexican epidemiological dataset. To tackle the class imbalance issue, numerous sampling techniques are applied, including SMOTE, SMOTE-ENN, ADASYN, SMOTE-Tomek, and Random Under-Sampling (RUS). Predictive models are created using several machine learning algorithms: Logistic Regression, Decision Tree, Gaussian Naïve Bayes, K-Nearest Neighbors, and Random Forest. Besides, we performed feature selection analysis using Shap technique to determine the main relevant attributes for predicting COVID-19 mortality. The results illustrate that Random Forest model, trained on balanced data with SMOTE-ENN technique yielded the best performance, with 89.44% accuracy, 87.88% Recall, and 88.74% ROC AUC score. Furthermore, feature selection analysis shows that Type of Patient, Age, Pneumonia, Intubation, having contact with COVID-19 infected patients are the key important attributes for predicting COVID-19 risk of fatality among hospitalized individuals.

Ask AI

Mark Helpful

Bookmark

Relay