September 28, 2025

Generative AI Models for Synthetic Data Creation in Healthcare and Cybersecurity

Key Points

Generative AI enables high-fidelity synthetic data creation for healthcare and cybersecurity.
Synthetic datasets effectively improved predictive modeling in healthcare and anomaly detection in cybersecurity.
Privacy evaluations showed a significant reduction in re-identification vulnerabilities for synthetic data.
Strict ethical validation ensures responsible use of synthetic datasets in sensitive sectors.

Abstract

The growing use of solutions based on data in sensitive sectors like healthcare and cybersecurity are usually limited by a lack of data and strict privacy standards. This research paper set out to explore how generative artificial intelligence (AI) tools, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs) can be used to produce privacy-preserving synthetic data sets capable of supplementing the limited amount of data by preserving confidentiality. The descriptive-analytical research design was chosen, which was supported by empirical demonstrations in two areas: healthcare, where tabular records of patients will be used to make a prediction of a disease, and cybersecurity where benign network traffic flows will be involved in detecting anomalies. Synthetic datasets were tested on three important dimensions: fidelity, which is used to gauge similarity with real data; utility, which is used to gauge performance in downstream machine learning applications; and privacy, which is used to gauge risks of data memorization or leakage. The findings showed that the class-conditional GMMs were useful in modeling distributions of patient features, improving predictive modeling with real data, and synthetic benign traffic helped to detect anomalies very well in cybersecurity tasks. Privacy evaluations indicated that no data was memorized to give an individual record which reduced the re-identification vulnerability. Comprehensively, the research paper shows that generative AI can deliver high-fidelity, utility-based, and privacy-conscious synthetic datasets, which is a scalable solution to data shortage as well as the significance of strict validation, ethical supervision, and control in sensitive data use.

Bookmark

Cite This Study

Diwakar Ramanuj Tripathi (Fri,) studied this question.

synapsesocial.com/papers/68d9052141e1c178a14f4fe5 https://doi.org/https://doi.org/10.22214/ijraset.2025.74358

Bookmark