What does this research mean for the field?

The proposed Robust Adaptive (RA) Cluster Validity Index outperforms classical indices in determining the optimal number of clusters for datasets with mild to moderate overlap, uneven density distribution, and outliers, though density-based indices inherently struggle with severe inter-cluster overlap. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This paper aims to develop a robust adaptive Cluster Validity Index (RA index) to enhance clustering accuracy in overlapping datasets.

May 16, 2026Open Access

A Robust Adaptive Clustering Validity Index for Overlapping Data

Key Points

This paper aims to develop a robust adaptive Cluster Validity Index (RA index) to enhance clustering accuracy in overlapping datasets.
Proposed a multiplicative and adaptive Cluster Validity Index based on kernel density functions.
Incorporated density quantiles for intra-cluster compactness and developed Jeffrey divergence for inter-cluster separability.
Conducted experiments on synthetic and real-world datasets to validate performance against classical indices.
The RA index identified the optimal number of clusters in 88.89% of synthetic datasets (8 out of 9).
Demonstrated superior performance compared to classical indices (CH, DB, SIL, I) in mild to moderate overlap scenarios.
Was the most robust metric among five compared indices on eight real-world datasets, with some limitations noted on complex datasets.

Abstract

Cluster Validity Indices (CVIs) act as a pivotal tool in machine learning for assisting in the determination of the optimal number of clusters. Nevertheless, traditional CVIs often exhibit subpar performance when confronted with the complex characteristics prevalent in real-world data, such as inter-cluster overlap, outliers and uneven density distribution. To address this challenge, this paper proposes a multiplicative, adaptive and robust Cluster Validity Index, designated as the Robust Adaptive (RA) index. This index takes the kernel density function of sample points as the fundamental tool and reconstructs its two core components: in the measurement of intra-cluster compactness, the concept of density quantiles is incorporated, which markedly enhances its robustness against outliers; in the measurement of inter-cluster separability, a density-based Jeffrey divergence method is developed to effectively characterize inter-cluster differences in overlapping datasets. To mitigate the impact of bandwidth selection on kernel density estimation, this study adopts strategies including Scott’s and Silverman’s heuristic algorithms, thus enabling adaptive learning of the inherent distribution characteristics of data. For experimental validation, a comprehensive set of experiments was conducted on both synthetic and real-world datasets. The results show that, in comparison with the classical indices (CH, DB, SIL, I) that demonstrate prominent performance on overlapping datasets, the RA index delivers superior performance in scenarios involving mild to moderate overlap, uneven density distribution and the presence of outliers. Among nine synthetic datasets, the RA index correctly identified the optimal number of clusters in eight cases, achieving a high success rate of 88.89% and outperforming all the comparative indices. On eight real-world datasets with diverse scales, dimensionalities and inherent structural features, the RA index was also verified to be the most robust and effective metric among the five participating indices for comparison. Meanwhile, its failure on complex datasets such as S-set4 and Iris, which contain both severe inter-cluster overlap and outliers, also indicates that density-based CVIs have inherent limitations when faced with data structures characterized by high overlap and faint cluster boundaries. This finding points to a clear direction for future research: constructing novel CVIs from the perspective of sparse matrices may serve as a feasible breakthrough path to address such limitations.

A Robust Adaptive Clustering Validity Index for Overlapping Data

Key Points

Abstract

Cite This Study

Also Consider

Also Consider