Key points are not available for this paper at this time.
The real dataset has many shortcomings that pose challenges to machine learning. High dimensional and imbalanced class prevalence is two important challenges. Hence, the classification of data is negatively impacted by imbalanced data, and high dimensional could create suboptimal performance of the classifier. In this paper, we explore and analyse different feature selection methods for a clinical dataset that suffers from high dimensional and imbalance data. The aim of this paper is to investigate the effect of imbalanced data on selecting features by implementing the feature selection methods to select a subset of the original data and then resample the dataset. In addition, we resample the dataset to apply feature selection methods on a balanced class to compare the results with the original data. Random forest and J48 techniques were used to evaluate the efficacy of samples. The experiments confirm that resampling imbalanced class obtains a significant increase in classification performance, for both taxonomy methods Random forest and J48. Furthermore, the biggest measure affected by balanced data is specificity where it is sharply increased for all methods. What is more, the subsets selected from the balanced data just improve the performance for information gain, where it is played down for the performance of others.
Mohammad Al Khaldy (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: