August 1, 2000Open Access

IntelliClean

Key Points

Key points are not available for this paper at this time.

Abstract

Existing data cleaning methods work on the basis of computing the degree of similarity be t ween nearby records in a sorted database. High recall is achieved by accepting records with low degrees of similarity as duplicates, at the cost of lower precision. High precision is achieved analogously at the cost of lower recall. This is the r e c all-pr ecision dilemma. In this paper, we propose a generic knowledge-based framework for e ective data cleaning that implements existing cleaning strategies and more. We develop a new method to compute transitive closure under uncertaint ywhich handles the merging of groups of inexact duplicate records. Experimental results show that this framework can identify duplicates and anomalies with high recall and precision.

Bookmark

View Full Paper

Cite This Study

Lee et al. (Tue,) studied this question.

synapsesocial.com/papers/6a12ad5d4891eb3ecca41bea https://doi.org/https://doi.org/10.1145/347090.347154

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper