Full Citation
Title: Differentially Private Data Publishing for Data Analysis
Citation Type: Dissertation/Thesis
Publication Year: 2016
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: In the information age, vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to design mechanisms for effectively extracting knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of differentially private data publishing. Firstly, we propose PrivPfC, a differentially private method for releasing data for classification. The key idea underlying PrivPfC is to privately select, in a single step, a grid, which partitions the data domain into a number of cells. This selection is done using the exponential mechanism with a novel quality function, which maximizes the expected number of correctly classified records by a histogram classifier. PrivPfC supports both the binary classification as well as the multiclass classification. Secondly, we study the problem of differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of the DPLloyd algorithm which is a differentially private version of the Lloyd algorithm and propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. We also propose a hybrid approach that combines the advantages of the improved version of DPLloyd and EUGkM. Finally, we investigate the . . .
Url: https://docs.lib.purdue.edu/open_access_dissertations/1005
User Submitted?: No
Authors: Su, Dong
Institution: Purdue University
Department:
Advisor:
Degree:
Publisher Location:
Pages:
Data Collections: IPUMS USA
Topics: Methodology and Data Collection
Countries: