The growing use of solutions based on data in sensitive sectors like healthcare and cybersecurity are usually limited by a lack of data and strict privacy standards. This research paper set out to explore how generative artificial intelligence (AI) tools, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs) can be used to produce privacy-preserving synthetic data sets capable of supplementing the limited amount of data by preserving confidentiality. The descriptive-analytical research design was chosen, which was supported by empirical demonstrations in two areas: healthcare, where tabular records of patients will be used to make a prediction of a disease, and cybersecurity where benign network traffic flows will be involved in detecting anomalies. Synthetic datasets were tested on three important dimensions: fidelity, which is used to gauge similarity with real data; utility, which is used to gauge performance in downstream machine learning applications; and privacy, which is used to gauge risks of data memorization or leakage. The findings showed that the class-conditional GMMs were useful in modeling distributions of patient features, improving predictive modeling with real data, and synthetic benign traffic helped to detect anomalies very well in cybersecurity tasks. Privacy evaluations indicated that no data was memorized to give an individual record which reduced the re-identification vulnerability. Comprehensively, the research paper shows that generative AI can deliver high-fidelity, utility-based, and privacy-conscious synthetic datasets, which is a scalable solution to data shortage as well as the significance of strict validation, ethical supervision, and control in sensitive data use.
Diwakar Ramanuj Tripathi (Fri,) studied this question.