Key points are not available for this paper at this time.
The quality of data assists to develop good prediction models for diagnosis and treatment; thus, improving will the outcome of the patients. This study aimed to use the preprocessing approach in the home medical dataset to improve the quality of data to create a better model for the prediction. The home dataset was obtained from ultrasound results for breast cancer screening from patients’ records at Benghazi Medical Centre. Regression imputation and SMOTE techniques were used to resolve missing values and imbalance of data. Also, they used to generate three datasets in addition to the original dataset. Cfs and ReliefF techniques were selected for feature selection. This study found that (SMOTE) dataset and (Regression imputation +SMOTE) dataset had the best results of performance and ROC area. The Cfs feature selecting technique also gained better results compared to ReliefF feature selecting technique. The "BIRADs4" is a difficult class to recognize the important patterns for correct classification. The study concluded that (Regression imputation +SMOTE) dataset with Cfs feature selection was the most suitable preprocessing for this dataset.
Eltalhi et al. (Mon,) studied this question.