What is the clinical evidence from this study?

Study design: Other. Population: Third-trimester pregnant women undergoing fetal health monitoring with cardiotocography (CTG) classified as normal, suspect, or pathological fetal status (n=2126). Intervention: Machine learning models with resampling methods (SMOTE, BSMOTE, ADASYN, NearMiss, SCUT) for class imbalance correction in multi-class classification of fetal health from CTG data vs. Original unbalanced dataset (baseline). Primary outcome: Balanced Accuracy (BACC), Macro-F1, Macro-Matthews Correlation Coefficient (Macro-MCC), and Macro-Averaged ROC-AUC on held-out test set.

What does this research mean for the field?

Random Forest with BSMOTE achieved the highest Macro-MCC of 0.8533 and Macro-F1 of 0.9073 in class-balanced performance for CTG fetal health classification. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.SUPPORTS_CONSENSUS.

What question did this study set out to answer?

The study aims to systematically benchmark five resampling strategies for multi-class classification of fetal health using CTG data.

February 8, 2026Open Access

Addressing Class Imbalance in Fetal Health Classification: Rigorous Benchmarking of Multi-Class Resampling Methods on Cardiotocography Data

Key Result

Random Forest combined with Borderline-SMOTE achieved strongest class-balanced performance with Macro-MCC of 0.8533, Macro-F1 of 0.9073, and Macro-Averaged ROC-AUC approximately 0.986–0.989 in classifying fetal health from CTG data.

Key Points

The study aims to systematically benchmark five resampling strategies for multi-class classification of fetal health using CTG data.
Utilized seven machine learning models: Naïve Bayes, Random Forest, Linear Discriminant Analysis, k-Nearest Neighbors, Linear Support Vector Machine, Multinomial Logistic Regression, Multi-Layer Perceptron.
Evaluated original unbalanced dataset and five resampling methods: SMOTE, BSMOTE, ADASYN, NearMiss, SCUT.
Performance measured with Balanced Accuracy, Macro-F1, Macro-MCC, and Macro-Averaged ROC-AUC.
Random Forest showed the highest reliability with Balanced Accuracy of 0.9118 when trained on the original distribution.
RF combined with BSMOTE yielded strong class-balanced metrics, including Macro-MCC of 0.8533 and Macro-F1 of 0.9073.
Significant improvements were observed in minority class discrimination when using oversampling techniques like SMOTE and BSMOTE.

Structured PICO

Do resampling strategies improve the detection of minority pathological classes in machine learning models for fetal health classification using cardiotocography data?

Population

Cardiotocography (CTG) datasets for fetal health classification where pathological cases constitute less than 10%

Intervention

Five resampling strategies (SMOTE, BSMOTE, ADASYN, NearMiss, and SCUT) applied to seven machine learning models

Comparator

Original unbalanced dataset (base)

Outcome

Classification performance evaluated using Balanced Accuracy (BACC), Macro-F1, Macro-Matthews Correlation Coefficient (Macro-MCC), and Macro-Averaged ROC-AUC on a held-out test set

Strategic oversampling techniques, particularly BSMOTE combined with Random Forest, significantly improve minority class discrimination in automated fetal health classification using cardiotocography data.

Limitations

Study used a publicly available retrospective dataset, which may limit generalizability to clinical practice.
No prospective validation or external dataset evaluation reported.
Lack of demographic data limits assessment of population diversity and applicability.
Study focused on algorithmic benchmarking rather than clinical outcomes.

Abstract

Background/Objectives: Fetal health is essential in prenatal care, influencing both maternal and fetal outcomes. Cardiotocography (CTG) monitors uterine contractions and fetal heart rate, yet manual interpretation exhibits significant inter-examiner variability. Machine learning offers automated alternatives; however, class imbalance in CTG datasets where pathological cases constitute less than 10% leads to poor detection of minority classes. This study aims to provide the first systematic benchmark comparing five resampling strategies across seven classifier families for multi-class CTG classification, evaluated using imbalance-aware metrics rather than overall accuracy alone. Methods: Seven machine learning models were employed: Naïve Bayes (NB), Random Forest (RF), Linear Discriminant Analysis (LDA), k-Nearest Neighbors (KNN), Linear Support Vector Machine (SVM), Multinomial Logistic Regression (MLR), and Multi-Layer Perceptron (MLP). To address class imbalance, we evaluated the original unbalanced dataset (base) and five resampling methods: SMOTE, BSMOTE, ADASYN, NearMiss, and SCUT. Performance was evaluated on a held-out test set using Balanced Accuracy (BACC), Macro-F1, the Macro-Matthews Correlation Coefficient (Macro-MCC), and Macro-Averaged ROC-AUC. We also report per-class ROC curves. Results: Among all models, RF proved most reliable. Training on the original distribution (base) yielded the highest BACC (0.9118), whereas RF combined with BSMOTE provided the strongest class-balanced performance (Macro-MCC = 0.8533, Macro-F1 = 0.9073) with a near-perfect ROC-AUC (approximately 0.986–0.989). Overall, resampling effects proved model dependent. While some classifiers achieved optimal performance on the natural class distribution, oversampling techniques, particularly SMOTE and BSMOTE, demonstrated significant improvements in minority class discrimination and class-balanced metrics across multiple model families. Notably, certain models benefited substantially from resampling, exhibiting enhanced Macro-F1, BACC, and minority class recall without sacrificing overall accuracy. Conclusions: These findings establish robust, model-agnostic baselines for CTG-based fetal health screening. They highlight that strategic oversampling can translate improved minority class discrimination into clinically meaningful performance gains, supporting deployment in cost-sensitive and threshold-aware clinical settings.

Bookmark

View Full Paper