IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Combining Family History and Machine Learning to Link Historical Records

Citation Type: Working Paper

Publication Year: 2019

DOI: 10.3386/w26227

Abstract: A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.

Url: https://www.nber.org/papers/w26227.pdf

Url: http://www.nber.org/papers/w26227.pdf

User Submitted?: No

Authors: Price, Joseph; Buckles, Kasey; Van Leeuwen, Jacob; Riley, Isaac

Series Title: NBER WORKING PAPER SERIES

Publication Number: 26227

Institution: National Bureau of Economic Research

Pages: 33

Publisher Location: Cambridge, MA

Data Collections: IPUMS USA

Topics: Other

Countries: United States

IPUMS NHGIS NAPP IHIS ATUS Terrapop