IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: A Generic Method for Assessing the Quality of De-Identified Health Data.

Citation Type: Journal Article

Publication Year: 2016

PMID: 27577394

Abstract: Data sharing plays an important role in modern biomedical research. Due to the inherent sensitivity of health data, patient privacy must be protected. De-identification means to transform a dataset in such a way that it becomes extremely difficult for an attacker to link its records to identified individuals. This can be achieved with different types of data transformations. As transformation impacts the information content of a dataset, it is important to balance an increase in privacy with a decrease in data quality. To this end, models for measuring both aspects are needed. Non-Uniform Entropy is a model for data quality which is frequently recommended for de-identifying health data. In this work we show that it cannot be used in a meaningful way for measuring the quality of data which has been transformed with several important types of data transformation. We introduce a generic variant, which overcomes this limitation. We performed experiments with real-world datasets, which show that our method provides a unified framework in which the quality of differently transformed data can be compared to find a good or even optimal solution to a given data de-identification problem. We have implemented our method into ARX, an open source anonymization tool for biomedical data.

Url: http://www.ncbi.nlm.nih.gov/pubmed/27577394

User Submitted?: No

Authors: Prasser, Fabian; Bild, Raffael; Kuhn, Klaus A

Periodical (Full): Studies in health technology and informatics

Issue:

Volume: 228

Pages: 312-316

Data Collections: IPUMS Health Surveys - NHIS

Topics: Health, Methodology and Data Collection

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop