Data anonymization is a widely recognized approach to mitigate privacy concerns when sharing sensitive health information for research purposes. This work aims to examine its practical implementation by profiling nine popular repositories of anonymized data, which have collectively been referenced in over 15,000 academic articles within the last five years. These repositories include publicly funded as well as commercial services, repositories operated by individual healthcare providers, and large-scale research programs. The anonymization process across these sources was similar. Initially, identifiers are removed, including potentially directly identifiable attributes such as names, addresses, and insurance numbers, as well as attributes like birthdates and postal codes. Free-text fields, when present, are also stripped of these identifiers or completely removed. Another common practice involves generalizing or shifting dates. Some repositories additionally use pseudonyms for linking data from various sources and trusted research environments for controlling the setting in which the data is processed. Only one repository explicitly mentioned using syntactic privacy models to enforce risk thresholds. The presence and frequent referencing of popular repositories suggest that anonymization is adoptable on a broad scale and the data provided is of high value for research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Thierry Meurers
Berlin Institute of Health at Charité - Universitätsmedizin Berlin
Karen Otte
Berlin Institute of Health at Charité - Universitätsmedizin Berlin
Hammam Abu Attieh
University Hospital of Lausanne
Berlin Institute of Health at Charité - Universitätsmedizin Berlin
Building similarity graph...
Analyzing shared references across papers
Loading...
Meurers et al. (Thu,) studied this question.
synapsesocial.com/papers/689dfe90d61984b91e13bbae — DOI: https://doi.org/10.3233/shti250831