Key points are not available for this paper at this time.
In this digital-driven world, data privacy is of utmost importance, encompassing both individual and corporate needs. Data owners need to share their data to ensure that users can derive utility from it. However, releasing data in its original form can pose several privacy risks. One viable solution is to generate synthetic datasets based on the original data, ensuring that they can provide similar utility while mitigating privacy concerns. Various methods have been adopted to generate synthetic data, with the use of Generative Adversarial Networks (GANs) being the most popular approach. In this paper, we have generated synthetic data using different GAN variants and evaluated their performance in terms of the privacy-utility tradeoff. We compared these GAN variants against the baseline case, where the data was released in its original form, and the Gaussian copula method, which models the dependence structure between multiple variables. Our analysis indicates that the use of Conditional GAN (CTGAN) is the most effective approach for generating synthetic datasets to tackle the privacy-utility tradeoff problem.
Sakib et al. (Mon,) studied this question.