What question did this study set out to answer?

The aim is to enhance privacy during data clustering by using local differential privacy without relying on a third party.

February 22, 2026

Local Differential Privacy Preservation for Distributed Data With an Improved Distance‐Based Clustering Method

Key Points

The aim is to enhance privacy during data clustering by using local differential privacy without relying on a third party.
Developed the Two-tier Perturbation-based Local Differential Privacy Preservation K-prototyping (TP-LDPK) architecture.
Preprocessed numerical data with the Improved chaotic map and categorical data using Obfuscation techniques.
Applied the Generalized Random Response (GRR) algorithm for centroid determination.
TP-LDPK achieved clustering while ensuring local privacy protection.
Outcome metrics included a recall rate of 0.494, accuracy of 0.291, and F-measure of 0.387.
The approach showed practical effectiveness in maintaining data privacy.

Abstract

ABSTRACT The methods and procedures used to prevent the disclosure of private or sensitive information during data collection, processing, dissemination, or analysis are referred to as privacy preservation. K‐prototypes have become a widely adopted clustering technique for mixed‐type data in large‐scale data mining tasks due to their efficiency and simplicity. However, as sensitive user information is frequently included in the underlying data, this approach presents privacy issues. Conventional privacy‐preserving clustering techniques usually rely on a reliable third party to perform data preprocessing; however, it is frequently impractical to place total trust in such organizations. By implementing local differential privacy measures directly on user data, this study presents a revolutionary Two‐tier Perturbation‐based Local Differential Privacy Preservation K‐prototyping (TP‐LDPK) architecture that does away with the need for a trusted third party. Both numerical and categorical data are preprocessed using the Optimized Unary Encoding (OUE) algorithm and the Enhanced Min‐Max Normalization (EMN) technique, respectively. The Improved chaotic map and the Obfuscation approach are used to disturb the normalized numerical data and encoded categorical data, respectively, in the first‐tier perturbation. The Generalized Random Response (GRR) algorithm is used in the second tier to determine the new centroid set based on the information about the disturbed cluster. This technique protects sensitive data at every stage while enabling clustering through direct contact between the user and the server. Our approach produces high‐quality clustering results under stringent local privacy restrictions, as demonstrated by both theoretical and empirical assessments that support its practical effectiveness and privacy guarantees. The TP‐LDPK method yielded the lowest recall rate (0.494), accuracy rate (0.291), and F‐measure (0.387).

Demander à l'IA

Bookmark

Demander à l'IA

Bookmark

Local Differential Privacy Preservation for Distributed Data With an Improved Distance‐Based Clustering Method

Key Points

Abstract

Cite This Study