What question did this study set out to answer?

This research aims to develop a model that accurately predicts cardiovascular disease risk across various clinical datasets, addressing the challenges posed by feature inconsistencies.

June 1, 2026Open Access

Hybrid FT-transformer with residual MLP for transferable cardiovascular risk prediction across cohorts

Key Points

This research aims to develop a model that accurately predicts cardiovascular disease risk across various clinical datasets, addressing the challenges posed by feature inconsistencies.
Hybrid Feature Tokenizer Transformer (HyFT) was created by integrating a Feature Tokenizer Transformer with a residual Multilayer Perceptron.
Datasets from Framingham and Kaggle were combined for model training, while Cleveland served for external validation.
A preprocessing pipeline was used, including feature mapping, imputation, standardization, feature selection via Salp Swarm Algorithm, and hyperparameter tuning using Particle Swarm Optimization.
HyFT achieved 0.93 accuracy and 0.92 Macro-F1 on the Cleveland cohort.
The model outperformed SVM, Random Forest, XGBoost, and MLP baselines in predictive performance.
Statistical analyses confirmed the reliability and contributions of the individual components within the model.

Abstract

Abstract Cardiovascular disease (CVD) prediction across heterogeneous clinical datasets remains challenging due to feature inconsistencies, missing values, and distribution shifts between cohorts. This study proposes a Hybrid Feature Tokenizer Transformer (HyFT) integrating a Feature Tokenizer Transformer with a residual Multilayer Perceptron branch for transferable CVD risk prediction. Framingham and Kaggle datasets were harmonized and merged for training, while the Cleveland dataset was strictly reserved for external validation to evaluate cross-cohort generalization. A comprehensive preprocessing pipeline including canonical feature mapping, imputation with missingness indicators, standardization, Salp Swarm Algorithm (SSA) based feature selection, Principal Component Analysis (PCA), and Particle Swarm Optimization (PSO) hyperparameter tuning was implemented. The proposed HyFT model achieved 0.93 accuracy and 0.92 Macro-F1 on the unseen Cleveland cohort, outperforming Support Vector Machine (SVM), Random Forest (RF), XGBoost, and Multi-Layer Perceptron (MLP) baselines. Ablation and bootstrap confidence interval analyses confirmed the statistical reliability and contribution of each component, demonstrating robust and clinically interpretable multi-cohort CVD prediction.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

T. K. Revathi

B. Sathiyabhama

S. Kaliraj

Manipal Academy of Higher Education

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Hybrid FT-transformer with residual MLP for transferable cardiovascular risk prediction across cohorts

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study