IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Combining family history and machine learning to link historical records: The Census Tree data set

Citation Type: Journal Article

Publication Year: 2021

ISSN: 0014-4983

DOI: 10.1016/J.EEH.2021.101391

Abstract: A key challenge for research on many questions in the social sciences is that it is difficult to link records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we contribute to recent efforts to create these links with a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. We use these “true” links both to inform the decisions one needs to make when using automated methods to link records and as a training data set for use in a supervised machine learning approach. We describe our procedure and illustrate its potential by linking individuals across the 100% samples of the US censuses from 1900, 1910, and 1920. When linking adjacent censuses, we obtain an overall match rate of 62-65 percent (for over 88.9 million matches), with a false positive rate that is around 6-7 percent and with links that are similar to the population along observable characteristics. Thus, our method allows us to link records with a combination of a high match rate, precision, and representativeness that is beyond the current frontier. Finally, we demonstrate the potential of the data by estimating the degree of intergenerational transmission of literacy between father-son and mother-daughter pairs.

Url: https://www.sciencedirect.com/science/article/pii/S0014498321000024

User Submitted?: No

Authors: Price, Joseph; Buckles, Kasey; Van Leeuwen, Jacob; Riley, Isaac

Periodical (Full): Explorations in Economic History

Issue:

Volume: 80

Pages: 1-28

Data Collections: IPUMS USA - Ancestry Full Count Data

Topics: Family and Marriage, Population Data Science

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop