Abstract Objective Cardiovascular diseases (CVDs) remain the leading cause of global mortality, necessitating robust predictive tools. While machine learning (ML) models have shown promise, transformer-based deep learning architectures are emerging as powerful alternatives. However, their comparative performance, especially under real-world challenges like class imbalance, remains underexplored. Methods We systematically benchmarked state-of-the-art transformer-based models (e.g., FT-Transformer, SAINT, TabNet) against traditional ML algorithms (e.g., XGBoost, Logistic Regression) across three public CVD datasets of varying size and complexity: the balanced small UCI dataset, the medium sized imbalanced Framingham dataset, and the large-scale Kaggle CVD dataset. A unified preprocessing pipeline was applied, including MICE for imputing missing values and SMOTETomek resampling for class imbalance in Framingham. Models were evaluated via stratified 10-fold cross-validation. SHAP analysis was used to assess feature importance and explainability. Results Model performance varied significantly with dataset characteristics. On the clean UCI dataset, FT-Transformer and Random Forest achieved near-perfect metrics (AUC > 0.99). In contrast, on the imbalanced Framingham dataset, most models showed poor sensitivity despite resampling; FT-Transformer maintained the best balance. On the Kaggle dataset, FT-Transformer and XGBoost performed similarly, identifying systolic blood pressure and age as key predictors. Conclusion Transformer-based models like FT-Transformer are effective on large, structured datasets but exhibit performance degradation under imbalance and noise. Conventional models, particularly XGBoost, remain competitive in these settings. These findings underscore the importance of dataset-aware model selection and highlight that deep learning is not universally superior for CVD risk prediction.
Upadhyayula et al. (Mon,) studied this question.