During the COVID-19 pandemic, researchers have made efforts to detect COVID-19 through various methods. In the dataset used for this study, COVID-19 patients were identified using chest computed tomography (CT) images. High dimensionality is frequently an issue in machine learning image classification. Accordingly, this study implemented three dimensionality reduction methods in combination with various machine learning algorithms for improved classification. Principal component analysis (PCA), uniform manifold approximation and projection (UMAP), and diffusion maps were applied to the dataset to extract the most important features of the chest CT images. The extracted features were given as input either to logistic regression or the extreme gradient boosting (XGBoost) algorithm to perform classification. The strongest model identified from this study was diffusion maps in combination with logistic regression. This model, evaluated against existing models from similar studies in recent years, yielded strong performance for detecting COVID-19 cases using chest CT images. Our proposed model achieved 97.35% accuracy, 92.16% sensitivity, and 98.59% specificity on the held-out test set in differentiating between COVID-19-positive cases and healthy, non-COVID-19 cases. This study aimed to detect COVID-19 without the use of viral testing. Importantly, this method could assist clinicians in making an initial diagnosis, especially when viral testing is not available or timely enough for the patient’s case. This study also provides deeper insight into various dimensionality reduction methods and how compatible they are with biomedical imaging data. Models were trained using stratified cross-validation on the training set, with final performance evaluated on a held-out test set at the patient level to prevent data leakage. Additional imbalance-aware metrics were used to assess robustness given class distribution differences.
Somodi et al. (Mon,) studied this question.