Los puntos clave no están disponibles para este artículo en este momento.
K-means clustering plays a vital role in data mining. As an iterative computation, its performance will suffer when applied to tremendous amounts of data, due to poor temporal locality across its iterations. The state-of-the-art streaming algorithm, which streams the data from disk into memory and operates on the partitioned streams, improves temporal locality but can misplace objects in clusters since different partitions are processed locally. This paper presents a collaborative divide-and-conquer algorithm to significantly improve the state-of-the-art, based on two key insights. First, we introduce a break-and-recluster procedure to identify the clusters with misplaced objects. Second, we introduce collaborative seeding between different partitions to accelerate the convergence inside each partition. Compared with the streaming algorithm using a number of wikipedia webpages as our datasets, our collaborative algorithm improves its clustering quality by up to 35.3% with an average of 8.8% while decreasing its execution times from 0.3% to 80.1% with an average of 48.6%.
Cui et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: