What type of study is this?

This is a Quantitative Study study.

September 18, 2025

Data Mining Approach: K-Means Clustering and Naïve Bayes Classifier for Graduate Quality Analysis

Key Points

The classification model achieved an accuracy of 95.24%, indicating high performance in graduate quality prediction.
Using k-means clustering, the study grouped graduates into three clusters based on similarity, optimizing with the silhouette score.
The data analysis followed the CRISP-DM methodology, enhancing the validity of results through a structured approach.
These findings highlight the potential of combining k-means and naive bayes for improving higher education decision-making.

Abstract

The application of data mining techniques plays an important role in educational data analysis, especially in evaluating the quality of graduates based on tracer study data. This study aims to apply the K-Means Clustering algorithm in grouping graduate data and the Naïve Bayes Classifier in classifying the quality of graduates based on the characteristics of each cluster. The methodology used refers to the CRISP-DM stage, with data obtained from the Tracer Study and PDDikti. The K-Means algorithm is used to group graduates into three clusters based on characteristic similarities, this is based on searching for the most optimal K value, namely with the Silhouette Score, then the data is balanced using the SMOTE-ENN method. Furthermore, the Naïve Bayes model is used to classify data into the formed clusters. The evaluation results show that the classification model has very good performance with an accuracy of 95.24%, a precision of 93.33%, a recall of 96.67%, and an f1-score of 94.54%. These findings indicate that the combination of the K-Means and Naïve Bayes algorithms can be applied effectively in clustering and predicting graduate quality, and can be used as a decision-making tool in developing the quality of higher education.

Ask AI

Helpful

Bookmark