ABSTRACT The methods and procedures used to prevent the disclosure of private or sensitive information during data collection, processing, dissemination, or analysis are referred to as privacy preservation. K‐prototypes have become a widely adopted clustering technique for mixed‐type data in large‐scale data mining tasks due to their efficiency and simplicity. However, as sensitive user information is frequently included in the underlying data, this approach presents privacy issues. Conventional privacy‐preserving clustering techniques usually rely on a reliable third party to perform data preprocessing; however, it is frequently impractical to place total trust in such organizations. By implementing local differential privacy measures directly on user data, this study presents a revolutionary Two‐tier Perturbation‐based Local Differential Privacy Preservation K‐prototyping (TP‐LDPK) architecture that does away with the need for a trusted third party. Both numerical and categorical data are preprocessed using the Optimized Unary Encoding (OUE) algorithm and the Enhanced Min‐Max Normalization (EMN) technique, respectively. The Improved chaotic map and the Obfuscation approach are used to disturb the normalized numerical data and encoded categorical data, respectively, in the first‐tier perturbation. The Generalized Random Response (GRR) algorithm is used in the second tier to determine the new centroid set based on the information about the disturbed cluster. This technique protects sensitive data at every stage while enabling clustering through direct contact between the user and the server. Our approach produces high‐quality clustering results under stringent local privacy restrictions, as demonstrated by both theoretical and empirical assessments that support its practical effectiveness and privacy guarantees. The TP‐LDPK method yielded the lowest recall rate (0.494), accuracy rate (0.291), and F‐measure (0.387).
Kancharla et al. (Thu,) studied this question.