MCD-SMOTE improved classification performance and reduced cross-validation score variability for imbalanced data with outliers compared to traditional SMOTE and MDO methods.
Using Mahalanobis distance based on Minimum Covariance Determinant (MCD) in SMOTE improves classification performance and consistency in imbalanced datasets with outliers.
SMOTE (synthetic minority over-sampling technique) has been used the most as a solution to the problem of imbalanced data.SMOTE selects the nearest neighbor based on Euclidean distance.However, Euclidean distance has the disadvantage of not considering the correlation between variables.In particular, the Mahalanobis distance has the advantage of considering the covariance of variables.But if there are outliers, they usually influence calculating the Mahalanobis distance.To solve this problem, we use the Mahalanobis distance by estimating the covariance matrix using MCD (minimum covariance determinant).Then apply Mahalanobis distance based on MCD to SMOTE to create new data.Therefore, we showed that in most cases this method provided high performance indicators for classifying imbalanced data.
Jung et al. (Mon,) conducted a other in Imbalanced data classification. MCD-SMOTE (Mahalanobis distance using Minimum Covariance Determinant applied to SMOTE) vs. Original data, SMOTE, and MDO was evaluated on F1 Score and AUC Score. MCD-SMOTE improved classification performance and reduced cross-validation score variability for imbalanced data with outliers compared to traditional SMOTE and MDO methods.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: