Machine learning models most often misclassify the positive class in the dataset with class imbalance. Besides, a sophisticated model involves the hyperparameters that need to be tuned to the optimal values. The study aims to tune hyperparameters of random forest (RF) and support vector machine (SVM) models using 5-fold cross-validation data, to build the best RF and SVM for two data scenarios: the original and oversampling training data, and to compare the models' performances in either the training or testing data. The RF hyperparameters: the instance number in the leaf node and tree depth of the RF, were acquired (500, 10), respectively. Whereas, the SVM hyperparameters: the values of gamma and constant, were acquired (0.001, 500), respectively. The benchmark models achieved around 98% across the accuracy, precision, recall, and F1 score metrics. However, it performed worse on the Mathew's Correlation Coefficient (MCC) and Area Under the Curve (AUC): 0.0000 and 0.5000, respectively. The models trained on the class-imbalance dataset failed to predict the positive class. Although the best RF and SVM models trained on the oversampled dataset perform worse than both benchmark models across four standard metrics, the RF best model shows improvements of approximately 7% (from 0.000 to 0.067) and 11% (from 0.500 to 0.612) while the SVM best model show slightly different improvements of approximately 6% (from 0.000 to 0.056) and 11% (from 0.500 to 0.611) in MCC and AUC, respectively. Both the RF and SVM models improve in predicting the positive class, and the best RF model performs slightly better.
Brata et al. (Thu,) studied this question.