August 7, 2025

Evaluation of Anonymization Practices Among Popular Repositories of Data for Biomedical Research

Key Points

MAIN FINDING: Anonymization practices across nine popular repositories of biomedical data are largely similar and effective.
KEY EVIDENCE: The repositories, referenced in over 15,000 academic articles, ensure privacy by removing direct identifiers and generalizing sensitive attributes.
APPROACH: The evaluation involved analyzing publicly funded, commercial, and healthcare provider-operated repositories for their data anonymization techniques.
SIGNIFICANCE: The findings indicate that consistent anonymization practices can enhance the sharing of sensitive health information for valuable research.

Abstract

Data anonymization is a widely recognized approach to mitigate privacy concerns when sharing sensitive health information for research purposes. This work aims to examine its practical implementation by profiling nine popular repositories of anonymized data, which have collectively been referenced in over 15,000 academic articles within the last five years. These repositories include publicly funded as well as commercial services, repositories operated by individual healthcare providers, and large-scale research programs. The anonymization process across these sources was similar. Initially, identifiers are removed, including potentially directly identifiable attributes such as names, addresses, and insurance numbers, as well as attributes like birthdates and postal codes. Free-text fields, when present, are also stripped of these identifiers or completely removed. Another common practice involves generalizing or shifting dates. Some repositories additionally use pseudonyms for linking data from various sources and trusted research environments for controlling the setting in which the data is processed. Only one repository explicitly mentioned using syntactic privacy models to enforce risk thresholds. The presence and frequent referencing of popular repositories suggest that anonymization is adoptable on a broad scale and the data provided is of high value for research.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Thierry Meurers

Berlin Institute of Health at Charité - Universitätsmedizin Berlin

Karen Otte

Berlin Institute of Health at Charité - Universitätsmedizin Berlin

Hammam Abu Attieh

Actions

Institutions

University Hospital of Lausanne

Berlin Institute of Health at Charité - Universitätsmedizin Berlin

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Evaluation of Anonymization Practices Among Popular Repositories of Data for Biomedical Research

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study