What type of study is this?

This is a Literature Review study (also classified as: Observational).

August 17, 2025Open Access

Benchmarking Transformer-Based and Conventional Machine Learning Models for Cardiovascular Disease Prediction on Datasets of Varying Scale and Complexity

Puntos clave

Transformer-based models show promise for cardiovascular disease prediction but struggle with imbalanced data, indicating a need for careful model selection.
On the UCI dataset, models like FT-Transformer achieved high accuracy, exceeding 0.99 AUC, while performance varied on more complex datasets.
Cross-validation methods confirmed the effectiveness of preprocessing techniques like MICE and SMOTETomek for class imbalance adaptation.
Understanding feature importance through SHAP analysis emphasized age and systolic blood pressure as significant predictors of cardiovascular risk.

Resumen

Abstract Objective Cardiovascular diseases (CVDs) remain the leading cause of global mortality, necessitating robust predictive tools. While machine learning (ML) models have shown promise, transformer-based deep learning architectures are emerging as powerful alternatives. However, their comparative performance, especially under real-world challenges like class imbalance, remains underexplored. Methods We systematically benchmarked state-of-the-art transformer-based models (e.g., FT-Transformer, SAINT, TabNet) against traditional ML algorithms (e.g., XGBoost, Logistic Regression) across three public CVD datasets of varying size and complexity: the balanced small UCI dataset, the medium sized imbalanced Framingham dataset, and the large-scale Kaggle CVD dataset. A unified preprocessing pipeline was applied, including MICE for imputing missing values and SMOTETomek resampling for class imbalance in Framingham. Models were evaluated via stratified 10-fold cross-validation. SHAP analysis was used to assess feature importance and explainability. Results Model performance varied significantly with dataset characteristics. On the clean UCI dataset, FT-Transformer and Random Forest achieved near-perfect metrics (AUC > 0.99). In contrast, on the imbalanced Framingham dataset, most models showed poor sensitivity despite resampling; FT-Transformer maintained the best balance. On the Kaggle dataset, FT-Transformer and XGBoost performed similarly, identifying systolic blood pressure and age as key predictors. Conclusion Transformer-based models like FT-Transformer are effective on large, structured datasets but exhibit performance degradation under imbalance and noise. Conventional models, particularly XGBoost, remain competitive in these settings. These findings underscore the importance of dataset-aware model selection and highlight that deep learning is not universally superior for CVD risk prediction.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Upadhyayula et al. (Mon,) studied this question.

synapsesocial.com/papers/68a36a560a429f797332f1f9 https://doi.org/https://doi.org/10.1101/2025.08.03.25332878

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo