IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Differentially Private Projected Histograms of Multi-Attribute Data for Classification

Citation Type: Miscellaneous

Publication Year: 2015

Abstract: In this paper, we tackle the problem of constructing a differentially private synopsis for the classification analyses. Several the state-ofthe-art methods follow the structure of existing classification algorithms and are all iterative, which is suboptimal due to the locally optimal choices and the over-divided privacy budget among many sequentially composed steps. Instead, we propose a new approach, PrivPfC, a new differentially private method for releasing data for classification. The key idea is to privately select an optimal partition of the underlying dataset using the given privacy budget in one step. Given one dataset and the privacy budget, PrivPfC constructs a pool of candidate grids where the number of cells of each grid is under a data-aware and privacy-budget-aware threshold. After that, PrivPfC selects an optimal grid via the exponential mechanism by using a novel quality function which minimizes the expected number of misclassified records on which a histogram classifier is constructed using the published grid. Finally, PrivPfC injects noise into each cell of the selected grid and releases the noisy grid as the private synopsis of the data. If the size of the candidate grid pool is larger than the processing capability threshold set by the data curator, we add a step in the beginning of PrivPfC to prune the set of attributes privately. We introduce a modified 2 quality function with low sensitivity and use it to evaluate an attributes relevance to the classification label variable. Through extensive experiments on real datasets, we demonstrate PrivPfCs superiority over the stateof-the-art methods.

Url: http://arxiv.org/pdf/1504.05997.pdf

User Submitted?: No

Authors: Su, Dong; Cao, Jianneng; Li, Ninghui

Publisher: Purdue University

Data Collections: IPUMS USA

Topics: Methodology and Data Collection, Other

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop