What question did this study set out to answer?

The aim is to develop an explainable AI model for heart disease diagnosis that improves accuracy and trust in predictions.

March 6, 2026Open Access

CardiaTics: An explainable AI integrated heart disease diagnosis model with feature engineering and stacked ensemble approach

Q: What is the clinical evidence from this study?

Study design: Other. Population: Individuals with suspected heart disease aged 30-77 from the UCI Machine Learning Repository dataset (n=1190). Intervention: CardiaTics stacked ensemble machine learning model with feature engineering using Pearson correlation, Chi-Square Test, and Recursive Feature Elimination (RFE) vs. Individual machine learning classifiers (e.g., Random Forest, MLP, KNN, ETC, XGB, SVC, SGD, ABC, CART, GBM). Primary outcome: Accuracy of heart disease prediction (Accuracy increase from 89.4% to 93.3% after feature selection with CardiaTics compared to individual classifiers).

Key Result

CardiaTics stacked ensemble model with feature engineering improved heart disease prediction accuracy from 89.4% to 93.3%, outperforming individual classifiers.

Key Points

The aim is to develop an explainable AI model for heart disease diagnosis that improves accuracy and trust in predictions.
Developed a stacked ensemble machine learning model named CardiaTics.
Performed outlier detection to ensure data integrity before analysis.
Applied ten distinct machine learning algorithms individually before combining them into a stacked model.
Utilized feature engineering techniques including Pearson correlation, Chi-Square Test, and Recursive Feature Elimination.
Employed SHAP and ELI5 to explain feature importance for interpretability.
Achieved an accuracy of 89.3% on raw data and 93.3% after feature selection.
Improved model performance significantly compared to individual classifiers.
Enhanced interpretability of the model through SHAP summary plots, revealing key contributors to heart disease.

Structured PICO

Does the CardiaTics stacked ensemble model improve the accuracy of heart disease diagnosis compared to individual classifiers?

Population

1,190 individuals from the UCI Machine Learning Repository (629 with heart disease, 561 normal), age range 30-77.

Intervention

CardiaTics, a stacked ensemble machine learning model integrating ten distinct algorithms with feature engineering (Pearson correlation, Chi-Square Test, Recursive Feature Elimination) and explainable AI (SHAP, ELI5).

Comparator

Individual machine learning classifiers and raw data without feature selection.

Outcome

Accuracy of heart disease detection

The CardiaTics stacked ensemble machine learning model, enhanced with feature engineering and explainable AI, achieves 93.3% accuracy in diagnosing heart disease while maintaining interpretability.

Main Result

Effect estimate: Accuracy increase from 89.4% to 93.3% after feature selection with CardiaTics compared to individual classifiers

Absolute Event Rate: 93.3% vs 89.4%

Limitations

Study used retrospective dataset from UCI repository, limiting generalizability to clinical practice.
No external validation on independent clinical datasets was reported.
Female participant percentage and demographic breakdowns were not fully specified.
The study is methodological and does not report clinical trial or patient outcome data.
No statistical significance testing for primary endpoint improvements was provided.

Abstract

Heart disease is a leading global cause of morbidity and mortality. Accurate and prompt diagnoses are crucial for its effective prevention and management. Integrating multiple machine learning algorithms, this research introduces a stacked ensemble machine learning model, called CardiaTics (stands for Cardiac DiagnosTics), toward improving heart disease detection. We detect outliers and remove them as a first-step to ensure data quality and maintain integrity. Ten distinct machine learning algorithms are then individually applied, culminating in the creation of a stacked ensemble model. We use feature engineering to refine the model further applying three well-known techniques –Pearson correlation, Chi-Square Test (Chi-2), and Recursive Feature Elimination. The implementation of these techniques on the benchmark dataset results in an optimized feature set. Experimental results show that CardiaTics delivers 89.3% accuracy on raw data, and significantly improves its accuracy after feature selection to 93.3%, outperforming the individual classifiers. However, can human professionals rely on algorithms for prediction when the underlying process is not fully understood? To address concerns regarding interpretability, trust, and transparency in black-box predictions, we propose utilizing SHapley Additive exPlanations (SHAP) and Explain Like I’m 5 (ELI5) in the second phase to elucidate feature importance in our model. The SHAP summary plots of CardiaTics reveal that the positive and negative contributors to heart disease are comparable, thereby enhancing the model’s interpretability and reliability and helping refine the decision-making process.

Bookmark

View Full Paper

Bookmark

View Full Paper

Cite This Study

Ghose et al. (Wed,) conducted a other in Individuals with suspected heart disease aged 30-77 from the UCI Machine Learning Repository dataset (n=1,190). CardiaTics stacked ensemble machine learning model with feature engineering using Pearson correlation, Chi-Square Test, and Recursive Feature Elimination (RFE) vs. Individual machine learning classifiers (e.g., Random Forest, MLP, KNN, ETC, XGB, SVC, SGD, ABC, CART, GBM) was evaluated on Accuracy of heart disease prediction (Accuracy increase from 89.4% to 93.3% after feature selection with CardiaTics compared to individual classifiers). CardiaTics stacked ensemble model with feature engineering improved heart disease prediction accuracy from 89.4% to 93.3%, outperforming individual classifiers.

synapsesocial.com/papers/69aa70b8531e4c4a9ff5ac10 https://doi.org/https://doi.org/10.1186/s40537-026-01395-8

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: