Full Citation
Title: Harnessing the Known Unknowns: Differential Privacy and the 2020 Census
Citation Type: Journal Article
Publication Year: 2022
ISBN:
ISSN:
DOI: 10.1162/99608F92.CB06B469
NSFID:
PMCID:
PMID:
Abstract: This special issue, Differential Privacy for the 2020 U.S. Census: Can We Make Data Both Private and Useful?, provides an entry point to help data scientists across many disciplines adjust to a big change in a key component of our national data infrastructure. The United States Census Bureau is adopting formal differential privacy protections for public products from the 2020 U.S. Decennial Census. This is the first time that a country has released most of its subpopulation counts with formal privacy protections, although certainly not the first time that other official counts have been perturbed for the purpose of disclosure avoidance. Population censuses are important. Indeed, they may be the oldest statistical products of communal societies. They are mentioned in the Bible (the book of Numbers) and required by Article I, Section 2 of the U.S. Constitution for allocating seats in Congress. After all, as Lord Kelvin noted in 1883: [W]hen you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind. (William Thomson, Lord Kelvin, Electrical Units of Measurement [1883]) [Lord Kelvin’s observation is often paraphrased to more zippy aphorisms such as, ‘You cannot manage what you cannot measure.’] These days, each U.S. decennial census plays a role far beyond simply determining how many seats each state holds in Congress. Statistical frames based on Census Bureau counts underlie nearly all the demographic descriptions and many decisions made by government, business, or other organizations in the United States. Massive federal expenditures are distributed according to population estimates based on census data. Furthermore, a number of active and influential research communities depend upon decennial census data products. Privacy protection for respondents is also important and getting more difficult to achieve. Such protection has long been required by law, in order to prevent harm and to encourage full and honest responses. Recently, though, growing uses of the decennial census, availability of other data sources, and increased computational firepower make protecting the privacy of census respondents more difficult. Fortunately, newly developed formal privacy protection systems can both measure the degree of privacy protection and allow adequate transparency to inform statistical inference on protected data. Previous statistical methods used to protect privacy (such as suppression and swapping observations) lack both of these desirable properties. Nevertheless, adopting a new form of privacy protection for such important data is far from easy. Some of the key challenges include implementation issues confronted by the Census Bureau, understanding analytical implications for data scientists, and managing communication so that all stakeholders can engage effectively with each other and inform the public about the implications of the change.
Url: https://hdsr.mitpress.mit.edu/pub/fgyf5cne/release/1?readingCollection=63678f6d
User Submitted?: No
Authors: Gong, Ruobin; Groshen, Erica L.; Vadhan, Salil
Periodical (Full): Harvard Data Science Review
Issue: Special Issue 2
Volume:
Pages:
Data Collections: IPUMS USA, IPUMS USA - Ancestry Full Count Data
Topics: Population Data Science
Countries: