IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Unsupervised Graph-Based Entity Resolution for Complex Entities

Citation Type: Journal Article

Publication Year: 2023

ISSN: 1556-4681

DOI: 10.1145/3533016

Abstract: Entity resolution (ER) is the process of linking records that refer to the same entity. Traditionally, this process compares attribute values of records to calculate similarities and then classifies pairs of records as referring to the same entity or not based on these similarities. Recently developed graph-based ER approaches combine relationships between records with attribute similarities to improve linkage quality. Most of these approaches only consider databases containing basic entities that have static attribute values and static relationships, such as publications in bibliographic databases. In contrast, temporal record linkage addresses the problem where attribute values of entities can change over time. However, neither existing graph-based ER nor temporal record linkage can achieve high linkage quality on databases with complex entities, where an entity (such as a person) can change its attribute values over time while having different relationships with other entities at different points in time. In this paper we propose an unsupervised graph-based ER framework that is aimed at linking records of complex entities. Our framework provides five key contributions. First, we propagate positive evidence encountered when linking records to use in subsequent links by propagating attribute values that have changed. Second, we employ negative evidence by applying temporal and link constraints to restrict which candidate record pairs to consider for linking. Third, we leverage the ambiguity of attribute values to disambiguate similar records that however belong to different entities. Fourth, we adaptively exploit the structure of relationships to link records that have different relationships. Fifth, using graph measures we refine matched clusters of records by removing likely wrong links between records. We conduct extensive experiments on seven real-world data sets from different domains showing that on average our unsupervised graph-based ER framework can improve precision by up-to 25% and recall by up-to 29% compared to several state-of-the-art ER techniques.

Url: https://doi.org/10.1145/3533016

User Submitted?: No

Authors: Kirielle Nishadi, ; Christen Peter, ; Ranbaduge Thilina,

Periodical (Full): ACM Transactions on Knowledge Discovery from Data

Issue: 1

Volume: 17

Pages: 1-30

Data Collections: IPUMS USA

Topics: Methodology and Data Collection, Other, Population Data Science

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop