IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Automated Census Record Linking

Citation Type: Miscellaneous

Publication Year: 2014

Abstract: Thanks to the availability of new historical census sources and advances in record linking technology, economic historians are becoming big data genealogists. Linking individuals over time and between databases has opened up new avenues for research into intergenerational mobility, assimilation, discrimination, and the returns to education. To take advantage of these new research opportunities, scholars need to be able to accurately and efficiently match historical records and produce an unbiased dataset of links for downstream analysis. I detail a standard and transparent census matching technique for constructing linked samples that can be replicated across a variety of cases. The procedure applies insights from machine learning classification and text comparison to the well known problem of record linkage, but with a focus on the sorts of cost and benefits of working with historical data. I begin by extracting a subset of possible matches for each record, and then use training data to tune a matching algorithm that attempts to minimize both false positives and false negatives, taking into account the inherent noise in historical records. To make the procedure precise, I trace its application to an example from my own work, linking children from the 1915 Iowa State Census tot heir adult-selves in the 1940 federal Census. In addition, I provided guidance on a number of practical questions, including how large the training data needs to be relative to the sample.

Url: http://scholar.harvard.edu/files/jfeigenbaum/files/feigenbaum-censuslink.pdf

User Submitted?: No

Authors: Feigenbaum, James J.

Publisher: Harvard University

Data Collections: IPUMS USA

Topics: Methodology and Data Collection

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop