What question did this study set out to answer?

This work aims to explore the regulatory landscape and best practices for the quality and reliability of synthetic data created using generative AI for biomedical innovation.

May 9, 2026Open Access

64 Synthetic data generation using generative AI to support biomedical innovation: A health policy perspective

RHRachele Hendricks-SturrupDuke Institute for Health Innovation MNMaryam NafieDuke Institute for Health Innovation

Key Points

This work aims to explore the regulatory landscape and best practices for the quality and reliability of synthetic data created using generative AI for biomedical innovation.
Identified areas where synthetic data holds value for health researchers.
Reviewed regulatory documents and literature to understand current applications of synthetic data.
Developed a risk-based credibility assessment framework aligning with governmental standards.
Synthetic data offers value as a privacy-enhancing technology, data science sandbox, legal navigation mechanism, and method to augment underrepresented subgroups.
Raises ethical and legal concerns regarding privacy, consent, and stakeholder engagement.
Regulatory bodies like FDA and EMA are examining synthetic data for various medical applications.

Abstract

Objectives/Goals: Synthetic data holds potential for inclusion in medical product development pipelines. Therefore, we explored the current regulatory and practice landscape to identify best practices used to ensure, and communicate to key stakeholders, synthetic data quality, relevance, and reliability in regulatory settings. Methods/Study Population: We identified areas in which synthetic data created using generative AI holds the most value for health researchers. We reviewed regulatory documents, published literature, and expert insights to examine how regulators currently use, define, apply, and govern synthetic data created using generative AI. Next, we identified data management tools, best practices, ethical considerations, and regulatory developments that are necessary for generating fit-for-use synthetic datasets. Using this information, we developed a risk-based credibility assessment framework that aligns with current governmental standards that can be useful for users of synthetic data derived from generative AI applications in regulatory settings. Results/Anticipated Results: Synthetic data created using generative AI holds value for health researchers across four key areas: Acting as a 1) privacy-enhancing technology, 2) data science ’sandbox’ for training and exploration, 3) mechanism to navigate legalities around data sharing and/or use, and 4) method to augment underrepresented subgroups in datasets. However, synthetic data raise ethical and legal concerns, particularly regarding privacy, consent, stakeholder engagement, and ownership. Regulators and health technology assessment bodies, including FDA, EMA, MHRA, and Canada’s Drug Agency, are exploring synthetic data to supplement medical datasets, validate external control arms, enhance model performance, and inform regulatory decision-making. Discussion/Significance of Impact: Our work underscores the need to continue cultivating transparent, ethical, and fit-for-use approaches to synthetic data generation using generative AI. Moving forward, effective synthetic data use and development requires a culture of learning and transparency among regulators, end users, and those involved in data generation and exchange.

Ask AI

Helpful

Bookmark

View Full Paper