August 24, 2008

Automatic record linkage using seeded nearest neighbour and support vector machine classification

Key Points

Key points are not available for this paper at this time.

Abstract

The task of linking databases is an important step in an increasing number of data mining projects, because linked data can contain information that is not available otherwise, or that would require time-consuming and expensive collection of specific data. The aim of linking is to match and aggregate all records that refer to the same entity. One of the major challenges when linking large databases is the efficient and accurate classification of record pairs into matches and non-matches. While traditionally classification was based on manually-set thresholds or on statistical procedures, many of the more recently developed classification methods are based on supervised learning techniques. They therefore require training data, which is often not available in real world situations or has to be prepared manually, an expensive, cumbersome and time-consuming process.

Mark Helpful

Bookmark

Relay

Cite This Study

Peter Christen (Sun,) studied this question.

synapsesocial.com/papers/6a1c677a2cc291e7bf2fbbb2 https://doi.org/https://doi.org/10.1145/1401890.1401913