June 13, 2004

On coresets for k-means and k-median clustering

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in Rd, one can compute a weighted set S ⊆ P, of size O(k ε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1+ε)-approximation. As a result, we improve the fastest known algorithms for (1+ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1+ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.

On coresets for k-means and k-median clustering

Puntos clave

Resumen

Cite This Study

Also Consider

Also Consider