What question did this study set out to answer?

The aim is to tune hyperparameters of random forest and support vector machine models for better classification of imbalanced data.

March 7, 2026Open Access

The Acquiring Optimal Models of Random Forest and Support Vector Machine Through Tuning Hyperparameters in Classifying the Imbalanced Data

Key Points

The aim is to tune hyperparameters of random forest and support vector machine models for better classification of imbalanced data.
Used 5-fold cross-validation for model training and evaluation.
Tuned hyperparameters: random forest instance number (500), tree depth (10), support vector machine gamma (0.001), and constant (500).
Compared model performances on original and oversampled datasets.
Achieved 98% accuracy, precision, recall, and F1 score in benchmark models.
Both models improved in predicting the positive class after tuning.
Best RF model showed improvements in MCC (0.000 to 0.067) and AUC (0.500 to 0.612), while SVM showed slight improvements (MCC: 0.000 to 0.056, AUC: 0.500 to 0.611).

Abstract

Machine learning models most often misclassify the positive class in the dataset with class imbalance. Besides, a sophisticated model involves the hyperparameters that need to be tuned to the optimal values. The study aims to tune hyperparameters of random forest (RF) and support vector machine (SVM) models using 5-fold cross-validation data, to build the best RF and SVM for two data scenarios: the original and oversampling training data, and to compare the models' performances in either the training or testing data. The RF hyperparameters: the instance number in the leaf node and tree depth of the RF, were acquired (500, 10), respectively. Whereas, the SVM hyperparameters: the values of gamma and constant, were acquired (0.001, 500), respectively. The benchmark models achieved around 98% across the accuracy, precision, recall, and F1 score metrics. However, it performed worse on the Mathew's Correlation Coefficient (MCC) and Area Under the Curve (AUC): 0.0000 and 0.5000, respectively. The models trained on the class-imbalance dataset failed to predict the positive class. Although the best RF and SVM models trained on the oversampled dataset perform worse than both benchmark models across four standard metrics, the RF best model shows improvements of approximately 7% (from 0.000 to 0.067) and 11% (from 0.500 to 0.612) while the SVM best model show slightly different improvements of approximately 6% (from 0.000 to 0.056) and 11% (from 0.500 to 0.611) in MCC and AUC, respectively. Both the RF and SVM models improve in predicting the positive class, and the best RF model performs slightly better.

Bookmark

View Full Paper

Bookmark

View Full Paper

The Acquiring Optimal Models of Random Forest and Support Vector Machine Through Tuning Hyperparameters in Classifying the Imbalanced Data

Key Points

Abstract

Cite This Study