Full Citation
Title: Breakthroughs in Historical Record Linking Using Genealogy Data: The Census Tree Project †
Citation Type: Miscellaneous
Publication Year: 2023
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. These links allow researchers to construct a longitudinal dataset that is highly representative of the population, and that includes women, Black Americans, and other under-represented populations at unprecedented rates. In this paper, we describe our process for creating the Census Tree, beginning with a collection of over 317 million links contributed by the users of a free online genealogy platform. We then use these links as training data for a machine learning algorithm to make tens of millions of new matches. Finally, we incorporate other recent efforts to link the historical U.S. censuses and introduce a procedure for filtering the links and adjudicating disagreements. Our complete Census Tree achieves match rates between adjacent censuses that are between 69 and 86% for men, and between 58 and 79% for women, with over 41.5 million links for Black Americans.
User Submitted?: No
Authors: Buckles, Kasey; Haws, Adrian; Price, Joseph; Wilbert, Haley
Publisher:
Data Collections: IPUMS USA - Ancestry Full Count Data
Topics: Population Data Science, Race and Ethnicity
Countries: