Does synthetic data generation produce consistent results compared to federated analysis for evaluating cardiovascular health across international jurisdictions?
79,293 individuals from the Canadian Community Health Survey (CCHS) 2014 (n=63,522) and the Austria Health Interview Survey (ATHIS) 2014 (n=15,771)
Synthetic data generation (SDG) of the Canadian dataset pooled with the real Austrian dataset
Federated analysis on the original source datasets (DataSHIELD)
Consistency of regression results (parameter estimates) between the two approaches for assessing country-level differences in the role of sex on cardiovascular health (CVH) using a modified CANHEART index
Synthetic data generation provides a highly efficient and privacy-preserving alternative to federated analysis for conducting international comparative studies in cardiovascular health.
Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not been evaluated thus far. In this study we compared SDG with federated analysis to enable such international comparative studies. The objective of the analysis was to assess country-level differences in the role of sex on cardiovascular health (CVH) using a pooled dataset of Canadian and Austrian individuals. The Canadian data was synthesized and sent to the Austrian team for analysis. The utility of the pooled (synthetic Canadian + real Austrian) dataset was evaluated by comparing the regression results from the two approaches. The privacy of the Canadian synthetic data was assessed using a membership disclosure test which showed an F1 score of 0.001, indicating low privacy risk. The outcome variable of interest was CVH, calculated through a modified CANHEART index. The main and interaction effect parameter estimates of the federated and pooled analyses were consistent and directionally the same. It took approximately one month to set up the synthetic data generation platform and generate the synthetic data, whereas it took over 1.5 years to set up the federated analysis system. Synthetic data generation can be an efficient and effective tool for enabling multi-jurisdictional studies while addressing privacy concerns.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zahra Azizi
University of Ottawa
Simon David Lindner
Yumika Shiba
Scientific Reports
McGill University
Karolinska Institutet
University of Alberta
Building similarity graph...
Analyzing shared references across papers
Loading...
Azizi et al. (Mon,) studied this question.
synapsesocial.com/papers/6a0fd8489e54838161fd44d4 — DOI: https://doi.org/10.1038/s41598-023-38457-3