Random Forest combined with Borderline-SMOTE achieved strongest class-balanced performance with Macro-MCC of 0.8533, Macro-F1 of 0.9073, and Macro-Averaged ROC-AUC approximately 0.986–0.989 in classifying fetal health from CTG data.
Do resampling strategies improve the detection of minority pathological classes in machine learning models for fetal health classification using cardiotocography data?
Strategic oversampling techniques, particularly BSMOTE combined with Random Forest, significantly improve minority class discrimination in automated fetal health classification using cardiotocography data.
Background/Objectives: Fetal health is essential in prenatal care, influencing both maternal and fetal outcomes. Cardiotocography (CTG) monitors uterine contractions and fetal heart rate, yet manual interpretation exhibits significant inter-examiner variability. Machine learning offers automated alternatives; however, class imbalance in CTG datasets where pathological cases constitute less than 10% leads to poor detection of minority classes. This study aims to provide the first systematic benchmark comparing five resampling strategies across seven classifier families for multi-class CTG classification, evaluated using imbalance-aware metrics rather than overall accuracy alone. Methods: Seven machine learning models were employed: Naïve Bayes (NB), Random Forest (RF), Linear Discriminant Analysis (LDA), k-Nearest Neighbors (KNN), Linear Support Vector Machine (SVM), Multinomial Logistic Regression (MLR), and Multi-Layer Perceptron (MLP). To address class imbalance, we evaluated the original unbalanced dataset (base) and five resampling methods: SMOTE, BSMOTE, ADASYN, NearMiss, and SCUT. Performance was evaluated on a held-out test set using Balanced Accuracy (BACC), Macro-F1, the Macro-Matthews Correlation Coefficient (Macro-MCC), and Macro-Averaged ROC-AUC. We also report per-class ROC curves. Results: Among all models, RF proved most reliable. Training on the original distribution (base) yielded the highest BACC (0.9118), whereas RF combined with BSMOTE provided the strongest class-balanced performance (Macro-MCC = 0.8533, Macro-F1 = 0.9073) with a near-perfect ROC-AUC (approximately 0.986–0.989). Overall, resampling effects proved model dependent. While some classifiers achieved optimal performance on the natural class distribution, oversampling techniques, particularly SMOTE and BSMOTE, demonstrated significant improvements in minority class discrimination and class-balanced metrics across multiple model families. Notably, certain models benefited substantially from resampling, exhibiting enhanced Macro-F1, BACC, and minority class recall without sacrificing overall accuracy. Conclusions: These findings establish robust, model-agnostic baselines for CTG-based fetal health screening. They highlight that strategic oversampling can translate improved minority class discrimination into clinically meaningful performance gains, supporting deployment in cost-sensitive and threshold-aware clinical settings.
Hawrami et al. (Thu,) conducted a other in Third-trimester pregnant women undergoing fetal health monitoring with cardiotocography (CTG) classified as normal, suspect, or pathological fetal status (n=2,126). Machine learning models with resampling methods (SMOTE, BSMOTE, ADASYN, NearMiss, SCUT) for class imbalance correction in multi-class classification of fetal health from CTG data vs. Original unbalanced dataset (baseline) was evaluated on Balanced Accuracy (BACC), Macro-F1, Macro-Matthews Correlation Coefficient (Macro-MCC), and Macro-Averaged ROC-AUC on held-out test set. Random Forest combined with Borderline-SMOTE achieved strongest class-balanced performance with Macro-MCC of 0.8533, Macro-F1 of 0.9073, and Macro-Averaged ROC-AUC approximately 0.986–0.989 in classifying fetal health from CTG data.