Full Citation
Title: Combining Family History and Machine Learning to Link Historical Records
Citation Type: Working Paper
Publication Year: 2019
ISBN:
ISSN:
DOI: 10.3386/w26227
NSFID:
PMCID:
PMID:
Abstract: A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.
Url: https://www.nber.org/papers/w26227.pdf
Url: http://www.nber.org/papers/w26227.pdf
User Submitted?: No
Authors: Price, Joseph; Buckles, Kasey; Van Leeuwen, Jacob; Riley, Isaac
Series Title: NBER WORKING PAPER SERIES
Publication Number: 26227
Institution: National Bureau of Economic Research
Pages: 33
Publisher Location: Cambridge, MA
Data Collections: IPUMS USA
Topics: Other
Countries: United States