Abstract Cardiovascular disease (CVD) prediction across heterogeneous clinical datasets remains challenging due to feature inconsistencies, missing values, and distribution shifts between cohorts. This study proposes a Hybrid Feature Tokenizer Transformer (HyFT) integrating a Feature Tokenizer Transformer with a residual Multilayer Perceptron branch for transferable CVD risk prediction. Framingham and Kaggle datasets were harmonized and merged for training, while the Cleveland dataset was strictly reserved for external validation to evaluate cross-cohort generalization. A comprehensive preprocessing pipeline including canonical feature mapping, imputation with missingness indicators, standardization, Salp Swarm Algorithm (SSA) based feature selection, Principal Component Analysis (PCA), and Particle Swarm Optimization (PSO) hyperparameter tuning was implemented. The proposed HyFT model achieved 0.93 accuracy and 0.92 Macro-F1 on the unseen Cleveland cohort, outperforming Support Vector Machine (SVM), Random Forest (RF), XGBoost, and Multi-Layer Perceptron (MLP) baselines. Ablation and bootstrap confidence interval analyses confirmed the statistical reliability and contribution of each component, demonstrating robust and clinically interpretable multi-cohort CVD prediction.
Building similarity graph...
Analyzing shared references across papers
Loading...
T. K. Revathi
B. Sathiyabhama
S. Kaliraj
Manipal Academy of Higher Education
Building similarity graph...
Analyzing shared references across papers
Loading...
Revathi et al. (Sat,) studied this question.
synapsesocial.com/papers/6a1d224302fbce9130638117 — DOI: https://doi.org/10.1007/s42452-026-08905-6