Key points are not available for this paper at this time.
We propose a mathematical formulation for the notion of optimal projective cluster, starting from natural requirements on the density of points in subspaces.This allows us to develop a Monte Carlo algorithm for iteratively computing projective clusters.We prove that the computed clusters are good with high probability.We implemented a modified version of the algorithm, using heuristics to speed up computation.Our extensive experiments show that our method is significantly more accurate than previous approaches.In particular, we use our techniques to build a classifier for detecting rotated human faces in cluttered images. PROJECTIVE CLUSTERINGClustering is a widely used technique for data mining, indexing, and classification.Many practical methods proposed in the last few years, such as CLARANS 11, BIRCH 15, DBSCAN 5,6, and CURE 7, are "full-dimensional," in the sense that they give equal importance to all the dimensions when computing the distance between two points.While such approaches have proven successful for low-dimensional datasets, their accuracy and/or efficiency decrease significantly in higher dimensional spaces (see 9 for an excellent analysis and discussion).The reason for this performance deterioration is the so-called dimensionality curse.Recent research shows that for moderate-to-high dimensional spaces (tens or hundreds of dimensions), a full-dimensional distance is often irrelevant, as the farthest neighbor of a point is expected to be almost as Ý
Procopiuc et al. (Tue,) studied this question.