Background and Objectives: Diabetes mellitus is a chronic disease prevalent worldwide, carrying significant health and economic burdens. Early diagnosis is critical for effective disease management; however, the multidimensional and complex nature of diabetes makes accurate prediction challenging. In this study, an original feature selection approach based on the Apriori algorithm, traditionally used in market basket analysis, is proposed to identify symptom patterns associated with diabetes. The literature emphasizes that feature selection remains an active research problem. Materials and Methods: A real-world dataset consisting of 16 categorical symptom variables from 520 individuals, obtained from the UCI Machine Learning Repository, was used. Variable encoding, missing data checks, and continuous variable transformations were performed during the preprocessing stages. The basic symptoms frequently associated with diabetes were identified through association analysis using the Apriori algorithm, and these features were used for classification with four different machine learning algorithms (K-Nearest Neighbors, Support Vector Machines, Artificial Neural Networks, and Random Forests). Accuracy, precision, sensitivity, specificity, and F1 score metrics were considered in evaluations conducted on both full and reduced datasets. Results: Feature selection was found to significantly improve model performance, with the SVM algorithm achieving the highest success rate at 97% accuracy and an F1 score of 0.961. KNN stood out in identifying positive cases with 0.975 sensitivity. Conclusions: These findings reveal that Apriori-based feature selection is an effective and explainable method for symptom-based diabetes prediction. This method can contribute to the development of low-cost, symptom-based decision support systems, especially in areas with limited resources.
Ağlarcı et al. (Tue,) studied this question.