Learning to match and cluster large high-dimensional data sets for data integration | Synapse