This research is concerned with the application of machine learning techniques to predict the risk of diabetes. Diabetes is a very personal and medical problem. Development of accurate and efficient predictive models for diabetes is vital for its early screening and detection. This paper exploits the Pima Indian Diabetes Dataset to train models using individual clinical features like blood glucose level, body mass index, age, etc., and evaluate the predictive capabilities of four widely used supervised learning algorithms (logistic regression, support vector machine, random forest, and neural network). Accuracy, precision, recall and area under ROC curve (AUC) were primarily considered in this study to measure the performance of the models. Results: It is observed that Logistic Regression achieves the highest in AUC = 0.84 for small medical data. Additional experiments show that the factors that have most impact on diabetes are blood glucose, body mass index, and age. Unlike some complex models, logistic regression has substantial advantages with regard to stability and interpretability, which may make it more suitable for the prediction task in small clinical studies such as this. This work also emphasizes the significance of feature analysis and model selection for medical AI application, and provides empirical support for early warnings systems which predict diseases on the basis of data.
Yujia Tian (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: