Key points are not available for this paper at this time.
Constructing small-sized coresets for various clustering problems in different metric spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for (k, z) -clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for k-means (z=2) is minO (k3/2 ε−2), O (k ε−4) Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22, while the best known lower bound is Ω (kε−2) Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i. e. , ε, k), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound Ω (k ε−z−2) for Euclidean (k, z) -clustering when ε ≥ Ω (k−1/ (z+2) ). In view of the prior upper bound Õz (k ε−z−2) Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22, the bound is optimal. The new lower bound is surprising since Ω (kε−2) Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22 is "conjectured" to be the correct bound in some recent works (see e. g. , Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22; Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22). Our new lower bound instance is a delicate construction with multiple clusters of points, which is a significant departure from the previous construction in Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22 that contains a single cluster of points. The new lower bound also implies improved lower bounds for (k, z) -clustering in doubling metrics. (2) For the upper bound, we provide efficient coreset construction algorithms for (k, z) -clustering with improved or optimal coreset sizes in several metric spaces. In particular, we provide an Õz (k2z+2/z+2 ε−2) -sized coreset, with a unfied analysis, for (k, z) -clustering for all z≥ 1 in Euclidean space. This upper bound improves upon the Õz (k2ε−2) upper bound by Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22 (when k≤ ε−1), and matches the recent independent results Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22 for k-median and k-means (z=1, 2) and extends them to all z≥ 1.
Huang et al. (Mon,) studied this question.