Key points are not available for this paper at this time.
A set of deidentified patient data compliant with the Health Information Portability and Accountability Act (HIPAA) was compiled, the data lost as a function of unique data elements (UDEs) were measured, and the deidentified data were tested for potential for reidentification.After approval by the institutional review board of an integrated health system, a limited-data set was created by querying the health system's pharmacy, administrative, and financial files for patients discharged between January 1 and December 31, 2000. Using the HIPAA "safe-harbor" method, this limited-data set was converted into a deidentified-data table for future statistical analysis, and UDEs in both data sets were identified and quantified. Unique combinations of commonly available data were also identified.The limited-data set, representing 4,738 patient discharges, contained 810,456 UDEs in 322,657 records organized into four data tables (demographics, diagnoses, medication orders, and laboratory test results). The deidentified-data table, representing 4,722 discharges, contained 562,171 UDEs in 128 data-type columns in a single data table. About 31% of the data volume was lost. Much of the information lost was of the type that is of special interest to researchers (e.g., time between episodes of care, ages of >89 years).A study suggested that deidentified patient data with a reasonable degree of protection against reidentification were less complete than may be necessary for good research.
Clause et al. (Sat,) studied this question.