Key points are not available for this paper at this time.
Data cleaning with guaranteed reliability is hard to achieve without accessing external sources, since the truth is not necessarily discoverable from the data at hand. Furthermore, even in the presence of external sources, mainly knowledge bases and humans, effectively leveraging them still faces many challenges, such as aligning heterogeneous data sources and decomposing a complex task into simpler units that can be consumed by humans. We present K atara , a novel end-to-end data cleaning system powered by knowledge bases and crowdsourcing. Given a table, a kb , and a crowd, K atara (i) interprets the table semantics w.r.t. the given kb ; (ii) identifies correct and wrong data; and (iii) generates top- k possible repairs for the wrong data. Users will have the opportunity to experience the following features of K atara : (1) Easy specification: Users can define a K atara job with a browser-based specification; (2) Pattern validation: Users can help the system to resolve the ambiguity of different table patterns ( i.e. , table semantics) discovered by K atara ; (3) Data annotation: Users can play the role of internal crowd workers, helping K atara annotate data. Moreover, K atara will visualize the annotated data as correct data validated by the kb , correct data jointly validated by the kb and the crowd, or erroneous tuples along with their possible repairs.
Chu et al. (Sat,) studied this question.