Does a tuned Logistic Regression model with power transform and Kernel PCA improve heart disease prediction accuracy on UCI datasets?
A machine learning model utilizing power transformation, Kernel PCA, and Gridsearch-tuned Logistic Regression achieved 100% accuracy in predicting heart disease on UCI datasets.
Heart disease has turned into the most critical human disease and Heart failure rate has been increased. Accurate diagnosis and timely treatment is needed to prevent deaths. In this work, the heart disease is effectively predicted by a Machine Learning (ML) model, which is trained with the UCI datasets. Univariate and Multivariate analysis of the dataset is performed using statistical methods and checked for data - imbalanced, skewness/kurtosis in the distribution of data, and the correlation between the features. As distributions of few features show skewness and are not normally distributed, Power transformation technique is used to transform the dataset into normal distribution. Before performing the transformation, outliers are detected in the dataset and removed using the Turkey Fence algorithm. The best features among all features of the dataset are selected based on the correlation matrix and several Feature selection approaches, including Extra Trees Classifier. Random Search and Grid Search techniques are used in tuning Hyperparameters of Logistic Regression algorithms. Performance is evaluated using metrics such as confusion matrix, Accuracy score, precision-recall curve (PRC), and Receiver operating curve (ROC). Seven models are developed with various combinations, among which the model for which power transform, Kernel Principal Component Analysis (PCA) performed on the dataset and finally Gridsearch method used to tune hyperparameters gives the 100% accuracy.
Ambesange et al. (Wed,) studied this question.