Full Citation
Title: Unsupervised Graph-Based Entity Resolution for Complex Entities
Citation Type: Journal Article
Publication Year: 2023
ISBN:
ISSN: 1556-4681
DOI: 10.1145/3533016
NSFID:
PMCID:
PMID:
Abstract: Entity resolution (ER) is the process of linking records that refer to the same entity. Traditionally, this process compares attribute values of records to calculate similarities and then classifies pairs of records as referring to the same entity or not based on these similarities. Recently developed graph-based ER approaches combine relationships between records with attribute similarities to improve linkage quality. Most of these approaches only consider databases containing basic entities that have static attribute values and static relationships, such as publications in bibliographic databases. In contrast, temporal record linkage addresses the problem where attribute values of entities can change over time. However, neither existing graph-based ER nor temporal record linkage can achieve high linkage quality on databases with complex entities, where an entity (such as a person) can change its attribute values over time while having different relationships with other entities at different points in time. In this paper we propose an unsupervised graph-based ER framework that is aimed at linking records of complex entities. Our framework provides five key contributions. First, we propagate positive evidence encountered when linking records to use in subsequent links by propagating attribute values that have changed. Second, we employ negative evidence by applying temporal and link constraints to restrict which candidate record pairs to consider for linking. Third, we leverage the ambiguity of attribute values to disambiguate similar records that however belong to different entities. Fourth, we adaptively exploit the structure of relationships to link records that have different relationships. Fifth, using graph measures we refine matched clusters of records by removing likely wrong links between records. We conduct extensive experiments on seven real-world data sets from different domains showing that on average our unsupervised graph-based ER framework can improve precision by up-to 25% and recall by up-to 29% compared to several state-of-the-art ER techniques.
Url: https://doi.org/10.1145/3533016
User Submitted?: No
Authors: Kirielle Nishadi, ; Christen Peter, ; Ranbaduge Thilina,
Periodical (Full): ACM Transactions on Knowledge Discovery from Data
Issue: 1
Volume: 17
Pages: 1-30
Data Collections: IPUMS USA
Topics: Methodology and Data Collection, Other, Population Data Science
Countries: