In modern society, the use of personal data is advancing in many fields. However, such data utilization also increases the risk of privacy leakage. Therefore, Differential Privacy (DP) has been proposed as a measure of privacy protection. DP is a privacy-preserving measure when data collectors release data. However, since DP requires the trust of the data collector, Local Differential Privacy (LDP) was proposed as a privacy protection measure that does not rely on third-party trust. LDP assumes that data providers directly perturb their data, thereby protecting privacy leakage from personal data. LDP is useful in machine learning for data privacy and model privacy. However, a challenge with LDP is the difficulty in balancing privacy protection and utility when dealing with high-dimensional data. To address this, techniques such as dimensionality reduction and data discretization have been proposed. A machine learning framework called SUPM has been proposed to satisfy LDP. In SUPM, all attribute types, including categorical and numerical, are converted into ordered discrete sets with domain size L, performing uniform weak anonymization and applying perturbation uniformly. In this case, the domain and domain size L for each attribute must be predetermined regardless of the data characteristics. Therefore, it is necessary to know the characteristics, such as the utility, of each attribute in advance. This study proposes a an attribute domain reconstruction method that reduces domain size while preserving data utility using data collected during dimensionality reduction. The effectiveness of the proposed method is validated using two databases: ADULT and WDBC.
Tsujimoto et al. (Thu,) studied this question.