This study introduces a new clustering method based on the multivariate fuzzy K-modes algorithm. The proposed algorithm incorporates attribute weights determined by Gini impurity, which evaluates the significance of attribute values in both within-cluster and between-cluster variances. Additionally, instead of relying on the Hamming distance, the probabilistic distance is employed to compute the dissimilarity between objects or between objects and their corresponding centroids. This study also utilizes a particle swarm optimization (PSO) algorithm to find the optimal centroids, replacing the random generation of initial centroids and ensuring the global optimization. Thus, the proposed algorithm is named the PSO-based multivariate fuzzy weighted fuzzy K-modes algorithm with probabilistic distance (PSO-MFWFKM-PD). The proposed algorithm is evaluated against other benchmark algorithms in terms of accuracy (AC) and Davies-Bouldin Index (DBI) using five benchmark datasets. The result demonstrate that PSO-MFWFKM-PD outperforms the other algorithms. Furthermore, the algorithm is applied to a real-world case study for market segmentation, utilizing a soft-drinks consumer dataset from Thailand collected through an online questionnaire. The results from this application are also promising.
Kuo et al. (Mon,) studied this question.