August 21, 2025Open Access

Privacy-Preserving Attribute Domain Reconstruction for Machine Learning

Key Points

The proposed method preserves data utility while reducing domain size, improving local differential privacy.
Using local differential privacy, the approach involves data perturbation and reconstruction of attribute domains.
Assessment using two databases, ADULT and WDBC, demonstrated effectiveness in enhancing utility and privacy.
The findings suggest broader applicability of local differential privacy for various high-dimensional datasets.

Abstract

In modern society, the use of personal data is advancing in many fields. However, such data utilization also increases the risk of privacy leakage. Therefore, Differential Privacy (DP) has been proposed as a measure of privacy protection. DP is a privacy-preserving measure when data collectors release data. However, since DP requires the trust of the data collector, Local Differential Privacy (LDP) was proposed as a privacy protection measure that does not rely on third-party trust. LDP assumes that data providers directly perturb their data, thereby protecting privacy leakage from personal data. LDP is useful in machine learning for data privacy and model privacy. However, a challenge with LDP is the difficulty in balancing privacy protection and utility when dealing with high-dimensional data. To address this, techniques such as dimensionality reduction and data discretization have been proposed. A machine learning framework called SUPM has been proposed to satisfy LDP. In SUPM, all attribute types, including categorical and numerical, are converted into ordered discrete sets with domain size L, performing uniform weak anonymization and applying perturbation uniformly. In this case, the domain and domain size L for each attribute must be predetermined regardless of the data characteristics. Therefore, it is necessary to know the characteristics, such as the utility, of each attribute in advance. This study proposes a an attribute domain reconstruction method that reduces domain size while preserving data utility using data collected during dimensionality reduction. The effectiveness of the proposed method is validated using two databases: ADULT and WDBC.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper