What is the clinical evidence from this study?

Study design: Other. Population: Breast cancer. Intervention: AI synthetic data generation (Nemotron-Personas and Synthea). Primary outcome: Development of a synthetic dataset of patient personas augmented with detailed MRFs and longitudinal health parameters.

What question did this study set out to answer?

The central aim is to develop a privacy-preserving framework to simulate personalized patient journeys for early cancer detection, particularly for breast cancer.

February 22, 2026

Abstract PS3-06-25: Simulating Personalized Patient Journeys for Early Cancer Detection Using Artificial Intelligence Synthetic Data

Key Result

A high-fidelity, privacy-preserving synthetic dataset of diverse patient personas with longitudinal health trajectories was developed to evaluate AI models for early breast cancer detection.

Key Points

The central aim is to develop a privacy-preserving framework to simulate personalized patient journeys for early cancer detection, particularly for breast cancer.
Developed a synthetic dataset of patient personas enriched with modifiable risk factors and health parameters.
Utilized NVIDIA's Nemotron-Personas for demographic diversity and augmented it using Synthea for realistic health modeling.
Allowed for 'what-if' analyses to test the impacts of behavioral changes and interventions on cancer risk.
Created a high-fidelity synthetic dataset with diverse patient personas reflecting varied health trajectories.
Enabled simulations to explore scenarios like the effects of smoking cessation or regular screenings on breast cancer risk.
Layed the groundwork for precision prevention strategies informed by synthetic data insights.

Structured PICO

Population

Synthetic cohort dataset of patient personas augmented with detailed Modifiable Risk Factors (MRFs) and longitudinal health parameters, specifically modeling breast cancer progression.

Intervention

Development of a synthetic dataset using NVIDIA's Nemotron-Personas and Synthea to simulate personalized patient journeys and modifiable risk factors.

Outcome

Establishment of a high-fidelity, privacy-preserving synthetic dataset of diverse patient personas with longitudinal health trajectories relevant to breast cancer risk.

The development of a synthetic dataset with longitudinal health trajectories and modifiable risk factors provides a privacy-preserving testbed for training AI models in early breast cancer detection.

Abstract

Abstract Developing and validating AI/ML models for early cancer detection is significantly hampered by the scarcity, sensitivity, and ethical complexities associated with real patient data. Synthetic data offers a compelling solution, providing a controlled environment for rigorous model development and testing without compromising privacy. Furthermore, Modifiable Risk Factors (MRFs) such as diet, physical activity, alcohol consumption, sleep patterns, and smoking, are critical determinants of cancer risk, including breast cancer. The ability to simulate the dynamic interplay of these factors and their impact on health outcomes is crucial for designing effective personalized prevention strategies. This study aims to establish a robust, privacy-preserving framework for simulating personalized patient journeys to advance early cancer detection, with a particular focus on breast cancer. Our core objectives are: 1) To develop a rich synthetic dataset of patient personas augmented with detailed MRFs and longitudinal health parameters. 2) To enable "what-if" scenario analysis, allowing for the simulation of the impact of behavioral changes and clinical interventions on individual cancer risk trajectories. This is a synthetic cohort dataset generated with artificial intelligence, designed to serve as a foundational resource for building persona simulation engines. We initiate persona generation by adapting NVIDIA's Nemotron-Personas dataset, leveraging its inherent demographic diversity (e.g., age, gender, occupation, geographic distribution) as a robust base. These generic personas are then augmented using Synthea, an open-source synthetic electronic health record (EHR) generator. We specifically utilize Synthea's capabilities to model realistic, longitudinal patient histories, incorporating a dedicated breast cancer module to simulate disease progression and relevant clinical events over time. Each persona can be customized with a comprehensive set of parameters, including nuanced dietary patterns, specific lifestyle behaviors (e.g., physical activity levels, sleep patterns), social factors, and simulated environmental risks, with a strong emphasis on quantifiable MRFs. This rich data will eventually make it possible to simulate dynamic patient journeys. It will allow researchers to explore complex “what-if” questions by adjusting modifiable risk factors (MRFs) and observing their effects on cancer risk progression. For example, we could model scenarios such as “What if this patient had annual mammogram screenings for the past three years?” or “What if that patient stops smoking today?” A high-fidelity, privacy-preserving synthetic dataset of diverse patient personas with rich, longitudinal health trajectories relevant to breast cancer risk. This dataset serves as a useful testbed for developing and evaluating AI/ML models for early cancer detection. Future work will add “what-if” simulation capabilities that are expected to provide valuable insights into the personalized impact of MRF modifications and early interventions, laying the groundwork for future precision prevention strategies in oncology. Citation Format: N. H. Borges, L. E. Silva e Oliveira, I. C. Cazagranda, C. B. de Albuquerque, O. Marques, C. d. Costa. Simulating Personalized Patient Journeys for Early Cancer Detection Using Artificial Intelligence Synthetic Data abstract. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS3-06-25.

Bookmark

Cite This Study

Borges et al. (Tue,) conducted a other in Breast cancer. AI synthetic data generation (Nemotron-Personas and Synthea) was evaluated on Development of a synthetic dataset of patient personas augmented with detailed MRFs and longitudinal health parameters. A high-fidelity, privacy-preserving synthetic dataset of diverse patient personas with longitudinal health trajectories was developed to evaluate AI models for early breast cancer detection.

synapsesocial.com/papers/699a9dcd482488d673cd3f08 https://doi.org/https://doi.org/10.1158/1557-3265.sabcs25-ps3-06-25

Bookmark