Introduction: Breast cancer remains a significant global health issue, necessitating rapid alternative diagnostic methods to improve survival rates. This study aimed to evaluate observer performance in the manual segmentation of Dynamic Contrast-Enhanced MRI (DCE-MRI) images and to assess the effectiveness of radiomic features and machine learning (ML) in classifying benign and malignant breast cancer. Methods: Breast lesions from 155 patients (65 benign, 90 malignant) were manually segmented on DCE-MRI images by four experienced radiologists using 3D Slicer (version 5.6.1). From each lesion, 107 radiomic features, including shape, first-order, and texture features, were extracted, yielding a high-dimensional dataset. All features were normalized using Z-score scaling. Feature selection was performed using LASSO regression with fivefold cross-validation. The dataset was divided into training and testing sets in a 70:30 ratio, and model performance was evaluated using five-fold cross-validation. The top 20 radiomic features were selected based on intraclass correlation coefficient (ICC) analysis to ensure feature stability. Nine machine learning models, CatBoost, Random Forest, XGBoost, AdaBoost, Naïve Bayes, Logistic Regression, k-NN, SVC, and MLP were employed for classifications. Hyperparameter tuning was applied to optimize model performance, and SHapley Additive exPlanations (SHAP) were used to identify key predictive features. Results: ICC values ranged from 0.941 to 0.992 (95% CI), demonstrating excellent reliability across all radiomic feature categories. CatBoost outperformed the others with an AUC of 0.937 (95%CI:0.852-0.993) with a sensitivity of 0.889 and a specificity of 0.909 in the internal test set. Other models, such as Random Forest (AUC:0.881, 95%CI:0.758-0.972) and Naïve Bayes (AUC:0.843,95%CI:0.707-0.949), performed well but were less effective compared to CatBoost. SHAP analysis showed that several radiomic features were significant in distinguishing malignant lesions. Discussion: Ensemble-based models generally outperformed traditional classifiers, such as Logistic Regression and k-NN, possibly because they can capture non-linear relationships in the dataset. SHAP analysis provided insight into model interpretability by identifying key features that contributed most significantly to the classification task. Conclusion: This study demonstrates the potential of integrating radiomic features with ML for breast cancer classification. CatBoost exhibited the highest predictive performance, highlighting its effectiveness in distinguishing malignant from benign lesions.
Ismail et al. (Tue,) studied this question.