What question did this study set out to answer?

This research aims to examine the approximability of k-means clustering in Euclidean spaces depending on the parameter k.

May 29, 2026Open Access

Near-Optimal Bounds for Parameterized Euclidean k-Means

Key Points

This research aims to examine the approximability of k-means clustering in Euclidean spaces depending on the parameter k.
Introduced a new hypothesis called the Exponential Time for Expanders Hypothesis (XXH).
Analyzed existing algorithms and established a lower bound for (1+ε)-approximations to k-means in Euclidean spaces.
Showed the optimality of existing algorithms based on assumptions of XXH.
Establishes a lower bound of no 2^{(k/ε)^{1-o(1)}} ⋅ n^O(1) time algorithm can achieve a (1+ε)-approximation for k-means.
Confirms that the O(n^{kd+1}) exact algorithm is optimal for small k values under XXH.
Matches runtime complexities of existing algorithms with new lower bounds.

Abstract

The k-means problem is a classic objective for modeling clustering in a metric space. Given a set of points in a metric space, the goal is to find k representative points so as to minimize the sum of the squared distances from each point to its closest representative. In this work, we study the approximability of k-means in Euclidean spaces parameterized by the number of clusters, k. In seminal works, de la Vega, Karpinski, Kenyon, and Rabani STOC'03 and Kumar, Sabharwal, and Sen JACM'10 showed how to obtain a (1+ε) -approximation for high-dimensional Euclidean k-means in time 2^ (k/ε) O (1) ⋅ dnO (1). In this work, we introduce a new fine-grained hypothesis called Exponential Time for Expanders Hypothesis (XXH) which roughly asserts that there are no non-trivial exponential time approximation algorithms for the vertex cover problem on near perfect vertex expanders. Assuming XXH, we close the above long line of work on approximating Euclidean k-means by showing that there is no 2^ (k/ε) ^{1-o (1) } ⋅ nO (1) time algorithm achieving a (1+ε) -approximation for k-means in Euclidean space. This lower bound is tight as it matches the algorithm given by Feldman, Monemizadeh, and Sohler SoCG'07 whose runtime is 2O (k/ε) + O (ndk). Furthermore, assuming XXH, we show that the seminal O (n^kd+1) runtime exact algorithm of Inaba, Katoh, and Imai SoCG'94 for k-means is optimal for small values of k.

Mark Helpful

Bookmark

Relay

View Full Paper