August 5, 2025Open Access

Exploring the Role of Synthetic Data in the Future of AI in Healthcare: A Scoping Review of Frameworks, Challenges, and Implications

MRMohammad Ishtiaque RahmanThomas More University RHRazuan Hossain SSSheikh Mohammad SayemBangladesh Agricultural University

Key Points

Synthetic data is a transformative tool in healthcare AI, improving applications like medical imaging and EHRs.
Key techniques for generating synthetic data include generative adversarial networks and variational autoencoders.
The review identifies challenges such as high computational demands and ethical concerns over data privacy and consent.
Standardized evaluation protocols and clearer regulatory guidance are needed for effective synthetic data use in healthcare.

Abstract

Synthetic data has emerged as a transformative tool in healthcare, particularly in areas such as medical imaging, electronic health records (EHRs), and clinical trial simulation, where data privacy, diversity, and accessibility are critical. This scoping review examines current approaches to synthetic data generation in healthcare, with a focus on AI model training, privacy preservation, and bias mitigation. A comprehensive search of PubMed, IEEE Xplore, and ACM Digital Library yielded 2,906 studies, of which 42 met the inclusion criteria. Key data generation techniques included generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, Bayesian networks, federated learning, recurrent neural networks (RNNs), large language models (LLMs), agent-based models, graph-based generators, and SMOTE-based oversampling. Applications ranged from diagnostic model development to privacy-preserving data sharing and educational simulation. However, the field faces persistent challenges, including inconsistent validation practices, the absence of standard benchmarks, high computational demands, and ethical concerns related to consent and bias. This review underscores the need for standardized evaluation protocols, clearer regulatory guidance, and multidisciplinary collaboration to ensure the safe, equitable, and effective use of synthetic data in healthcare AI.

Demander à l'IA

Bookmark

View Full Paper