Full Citation
Title: Methods and technologies for the secure collection, sanitization, processing and release of data
Citation Type: Dissertation/Thesis
Publication Year: 2021
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: The last decade has seen a significant increase in usage of cloud services. This trend is not only related to the low cost and high availability of cloud providers, but also to the ease of use of the service and the reliability over time. Digital devices become rapidly obsolete and are subject to failures, hence outsourcing data permits to reduce the risks linked to data loss. Albeit there are advantages in uploading data to the cloud, there are also several security and privacy challenges. The experience gained by the Research and Industry communities attest that it is not enough to just change the visibility of data by applying a cryptographic transformation, to ensure an adequate level of protection. A cloud-oriented architecture has a wide attack surface, hence it is necessary to pay attention to the whole data lifecycle, from data collection and sanitization, to storage and processing, and finally the release. This doctoral thesis analyzes each of these stages, proposing solutions that push forward the current state of the art. The first part of the thesis deals with the collection of data, in particular in the mobile scenario. The mobile environment is especially relevant as smartphones are devices with limited storage, that are connected to the network, and with the ability to sense and log confidential data and Personal Identifiable Information. To access this information, an application must be granted the proper permission. Yet, all the components running inside the application (either trusted or included from third-parties) share the same execution environment, thus have the same visibility and access constraints. This is a limitation of the current mobile Operating Systems. Focusing on the Android, which is open source and available to researchers, we propose a set of modifications to achieve internal application compartmentalization leveraging the Mandatory Access Control (MAC) layer. With this approach, the developer can add a policy module to the application to confine each component, effectively restricting access to the application internal storage, to services, and to isolate vulnerability prone components. After the data are collected, a user or a company may apply to it sanitization before being uploaded to the cloud or being released to a consumer. Data sanitization (or anonymization) is a process by which data are irreversibly altered so that a subject (referenced within the data) cannot be identified, given a certain security parameter, while the data remain practically useful. The second part of the thesis presents an approach based on k-anonymity and `-diversity to apply data sanitization over large collections of sensors data. The approach described can be applied in parallel in a distributed environment and is characterized by a limited information loss. The third part of the thesis investigates the storage and processing stages. In this scenario, the cloud provider is typically considered honest-but-curious, which assumes that it will always comply with the requests issued by the user, but may abuse the access to the information provided. Hence, the goal is to support the execution of queries over outsourced data with a guarantee that the cloud provider does not have access to the data content. Unfortunately, the simple use of deterministic encryption does not offer a real protection against a curious provider, as the encrypted data maintain the same distribution of the original data. The approach presented in this thesis is applicable to relational data, and enables the execution of queries involving evaluation of equality and range conditions over attributes. The data is saved encrypted to the server into equally large blocks containing a fixed number of tuples. The blocks are managed by the server as single atomic units, and accessed through an encrypted multidimensional index also stored by the server. By doing this, the cloud provider is unable to identify the single items stored within each block. Local maps are saved by the client to search the index efficiently. The approach proposed provides perfect indistinguishability to an attacker with access to the stored data. This is achieved applying probabilistic encryption to the blocks storing the data, and by destroying (i.e., flattening) the frequencies of the encrypted index. The index is built as an evolution of the partitioning technique presented in the second part of the thesis to sanitize the dataset. The last part of the thesis addresses the data release stage. The goal is to provide a solution that can be used to schedule the release of chunks or partitions of data at a future point in time. Due to the confidential nature of data, we cannot rely on any honesty assumption. Hence, we move to a decentralized environment in which the parties (i.e., the network nodes) are mutually distrusting. In this setting, we model the parties as rational (or rather driven by pure economic interest), and propose a solution that is only based on economic incentives and penalties. All the technologies detailed in this thesis have been released under open source licenses and can be readily integrated with real systems.
User Submitted?: No
Authors: Facchinetti, Dario
Institution: University of Bergamo
Department: School of Doctoral Studies
Advisor:
Degree:
Publisher Location: Bergamo
Pages: 1-187
Data Collections: IPUMS USA
Topics: Methodology and Data Collection, Population Data Science
Countries: