IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: How Well Do Automated Methods Perform in Historical Samples? Evidence from New Ground Truth

Citation Type: Working Paper

Publication Year: 2017

Abstract: New large-scale data linking projects are revolutionizing empirical social science. Outside of selected samples and tightly restricted data enclaves, little is known about the quality of these “big data” or how the methods used to create them shape inferences. This paper evaluates the performance of commonly used automated record-linking algorithms in three high quality historical U.S. samples. Our findings show that (1) no method (including hand linking) consistently produces samples representative of the linkable population; (2) automated linking tends to produce very high rates of false matches, averaging around one third of links across datasets and methods; and (3) false links are systematically (though differently) related to baseline sample characteristics. A final exercise demonstrates the importance of these findings for inferences using linked data. For a common set of records, we show that algorithm assumptions can attenuate estimates of intergenerational income elasticities by almost 50 percent. Although differences in these findings across samples and methods caution against the generalizability of specific error rates, common patterns across multiple datasets offer broad lessons for improving current linking practice.

Url: http://www.nber.org/papers/w24019

User Submitted?: No

Authors: Bailey, Martha; Cole, Connor; Henderson, Morgan; Massey, Catherine

Series Title: NBER Working Paper Series

Publication Number: 24019

Institution: NBER

Pages: 41

Publisher Location: Cambridge, MA

Data Collections: IPUMS USA - Ancestry Full Count Data

Topics: Aging and Retirement, Labor Force and Occupational Structure, Population Data Science

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop