What question did this study set out to answer?

This research investigates the relationship between two methods for identifying matching records across datasets.

May 12, 2026Open Access

Linking the Comparison and Graphical Approaches to Bipartite Matching

Key Points

This research investigates the relationship between two methods for identifying matching records across datasets.
Comparison of the Fellegi-Sunter model with the graphical record linkage model.
Development of a unified estimation framework using a classification expectation-maximization algorithm.
Empirical validation on simulations and a benchmark dataset.
Both models yield comparable performance metrics in simulations and real dataset applications.
The proposed unified framework maintains efficiency while adhering to problem constraints.
Direct relationships between model parameters were established under a common data model.

Abstract

Summary Bipartite record linkage has the goal of identifying observations referring to the same individual, called coreferent observations, across two distinct non‐duplicated datasets. The two main approaches to solve this task are the Fellegi–Sunter model, which relies on pairwise comparisons of observations, and the graphical record linkage model, which directly models the data and groups together coreferent observations. In this paper, we aim to investigate the similarities between these two methods. We show that both models can be expressed in terms of a latent binary matrix indicating coreferent record pairs, that they can be framed as particular latent class analysis models and that they admit a direct relationship between their parameters under a common data model. Moreover, we propose a unified estimation framework based on a classification expectation–maximization algorithm. The proposed estimation method properly incorporates the problem constraints, while still allowing for a computationally efficient implementation. Moreover, it allows for an interchangeable use of the same distributional assumptions on the linkage distribution between the two models. Empirical results using the proposed estimation method demonstrate satisfactory and mostly equivalent performance for two models both on simulations and on a real dataset commonly used as a benchmark for record linkage.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper