Full Citation
Title: Imputation of Missing Values for Hierarchical US Historical Census Data
Citation Type: Miscellaneous
Publication Year: 2006
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: The criteria for data gathering and accross cenusus varies over the course of time and space. This results in an arrangement in where the data is either missingor inconsistant from the point of view of a researcher who wants to compare and contrast census data accross censuses. As older census data is now being ported to databases it is now possible to address the problem of missing andinconsistent data with the help of data mining and machine learning tools. Thus the problem of missing data can be partially addressed if a reliable model forimputing missing values in the census data could be developed. In this paper we address the problem of imputation for the United States historical censusmicrodata for census years 1850 and 1860. The specific imputation problem that we address is that of imputating missing relation codes (relate codes) whichsignify the relationship between members in a family. We split the data into training set and test set, and then build a number of classification models forimputing missing values for the censusus where the relate codes are available. The models with perform sifficently high on the imputation task as judged by the evaluation metrices can then be used for imputing the missing values for the census years where the relate codes are not available.
User Submitted?: No
Authors: Bhatnagar, Nupur; Ahmad, Muhammad A.
Publisher: University of Minnesota
Data Collections: IPUMS USA
Topics: Methodology and Data Collection
Countries: