Objective—To identify effective data analytics and machine learning solutions that can help in the decision-making process in the medical domain and contribute to the understanding of COVID-19 disease. In this study, we analyze data from anonymized electronic medical records of 4711 patients with COVID-19 disease admitted to hospital in Atlanta. Methods—We used random forest, LightGBM, XGBoost, CatBoost, KNN, SVM, logistic regression, and MLP neural network models in this work. The models are evaluated depending on the type of prediction by relevant metrics, especially accuracy, F1-score, and ROC AUC score. Another aim of the work was to find out which factors most affected severity and mortality risk among the patients. To identify the important features, different statistical methods were used, as well as LASSO regression, and explainable artificial intelligence (XAI) method SHAP values for model explainability. The best models were implemented in a web application and tested by medical experts. The model for prediction of mortality risk was tested on a validation cohort of 45 patients from the Department of Infectiology and Travel Medicine, L. Pasteur University Hospital in Košice (UNLP). Results—Our study shows that the best model for predicting COVID-19 disease severity was the LightGBM model with accuracy of 88.4% using all features and 89.5% using the eight most important features. The best model for predicting mortality risk was also the LightGBM model, which achieved a ROC AUC score of 83.7% and a classification accuracy of 81.2% using all features. Using a simplified model trained on the 15 most important features, the ROC AUC score was 83.6% and the classification accuracy was 80.5%. We deployed the simplified models for predicting COVID-19 disease severity and for predicting the risk of COVID-19-related death in a web-based application and tested them with medical experts. This test resulted in a ROC AUC score of 83.6% and an overall prediction accuracy of 73.3%.
Lohaj et al. (Thu,) studied this question.