May 20, 2014

A collaborative divide-and-conquer K-means clustering algorithm for processing large data

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

K-means clustering plays a vital role in data mining. As an iterative computation, its performance will suffer when applied to tremendous amounts of data, due to poor temporal locality across its iterations. The state-of-the-art streaming algorithm, which streams the data from disk into memory and operates on the partitioned streams, improves temporal locality but can misplace objects in clusters since different partitions are processed locally. This paper presents a collaborative divide-and-conquer algorithm to significantly improve the state-of-the-art, based on two key insights. First, we introduce a break-and-recluster procedure to identify the clusters with misplaced objects. Second, we introduce collaborative seeding between different partitions to accelerate the convergence inside each partition. Compared with the streaming algorithm using a number of wikipedia webpages as our datasets, our collaborative algorithm improves its clustering quality by up to 35.3% with an average of 8.8% while decreasing its execution times from 0.3% to 80.1% with an average of 48.6%.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Cui et al. (Tue,) studied this question.

synapsesocial.com/papers/6a0fa2b8d13714ec96fe674f https://doi.org/https://doi.org/10.1145/2597917.2597918

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Preguntar a la IA

Me gusta

Guardar