Los puntos clave no están disponibles para este artículo en este momento.
Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find "low cost" changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple insertions and deletions, we follow recent work to define a database repair as a set of value modifications. In this context, we introduce a novel cost framework that allows for the application of techniques from record-linkage to the search for good repairs. We prove that finding minimal-cost repairs in this model is NP-complete in the size of the database, and introduce an approach to heuristic repair-construction based on equivalence classes of attribute values. Following this approach, we define two greedy algorithms. While these simple algorithms take time cubic in the size of the database, we develop optimizations inspired by algorithms for duplicate-record detection that greatly improve scalability. We evaluate our framework and algorithms on synthetic and real data, and show that our proposed optimizations greatly improve performance at little or no cost in repair quality.
Building similarity graph...
Analyzing shared references across papers
Loading...
Philip Bohannon
Rutgers, The State University of New Jersey
Wenfei Fan
Shanghai International Studies University
Michael Flaster
Alcatel Lucent (Germany)
University of Edinburgh
Alcatel Lucent (Germany)
Bell (Canada)
Building similarity graph...
Analyzing shared references across papers
Loading...
Bohannon et al. (Tue,) studied this question.
synapsesocial.com/papers/6a1ac8f07ff99bba06462a09 — DOI: https://doi.org/10.1145/1066157.1066175