IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Dirichlet Process Mixture Models for Nested Categorical Data

Citation Type: Dissertation/Thesis

Publication Year: 2015

Abstract: This thesis develops Bayesian latent class models for nested categorical data, e.g., people nested in households. The applications focus on generating synthetic microdata for public release and imputing missing data for household surveys, such as the 2010 U.S. Decennial Census. The first contribution is methods for evaluating disclosure risks in fully synthetic categorical data. I quantify disclosure risks by computing Bayesian posterior probabilities that intruders can learn confidential values given the released data and assumptions about their prior knowledge. I demonstrate the methodology on a subset of data from the American Community Survey (ACS). The methods can be adapted to synthesizers for nested data, as demonstrated in later chapters of the thesis. The second contribution is a novel two-level latent class model for nested categorical data. Here, I assume that all configurations of groups and units are theoretically possible. I use a nested Dirichlet Process prior distribution for the class membership probabilities. The nested structure facilitates simultaneous modeling of variables at both group and unit levels. I illustrate the modeling by generating synthetic data. . .

Url: https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/9933/Hu_duke_0066D_12907.pdf;sequence=1

User Submitted?: No

Authors: Hu, Jingchen

Institution: Duke University

Department: Statistical Science

Advisor: Jerome P. Reiter

Degree: PhD

Publisher Location:

Pages:

Data Collections: IPUMS USA

Topics: Other

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop