IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Instance-Based Learning with l-diversity

Citation Type: Journal Article

Publication Year: 2017

Abstract: Corporationsareretainingever-largercorpusesofpersonaldata;thefrequencyorbreaches and corresponding privacy impact have been rising accordingly. One way to mitigate this risk is through use of anonymized data, limiting the exposure of individual data to only where it is abso- lutely needed. This would seem particularly appropriate for data mining, where the goal is general- izable knowledge rather than data on specific individuals. In practice, corporate data miners often insist on original data, for fear that they might ”miss something” with anonymized or differentially private approaches. This paper provides a theoretical justification for the use of anonymized data. Specifically, we show that a k-nearest neighbor classifier trained on anatomized data satisfying l- diversity should be expected to do as well as on the original data. Anatomized data preserves all attribute values, but introduces uncertainty in the mapping between identifying and sensitive val- ues, thus satisfying l-diversity. The theoretical effectiveness of the proposed approach is validated using several publicly available datasets, showing that we outperform the state of the art for nearest neighbor classification using training data protected by k-anonymity, and are comparable to learning on the original data.

Url: http://www.tdp.cat/issues16/tdp.a273a17.pdf

User Submitted?: No

Authors: Mancuhan, Koray; Clifton, Chris

Periodical (Full): Transactions on Data Privacy

Issue:

Volume: 10

Pages: 203-235

Data Collections: IPUMS USA

Topics: Education

Countries: United States

IPUMS NHGIS NAPP IHIS ATUS Terrapop