Key points are not available for this paper at this time.
Abstract In the era of escalating data generation, particularly in sectors like healthcare, the abundance of diverse datasets poses a formidable challenge for effective analysis. Clustering has emerged as a crucial tool in this landscape, grouping data into clusters with shared characteristics. Our paper introduces an innovative approach to efficiently tackle the K-Means problem for large datasets. Leveraging Principle Component Analysis (PCA) for dimensionality reduction, the data is transformed, and the K-Means algorithm is applied in this reduced dimensionality, showcasing significant computational advantages over the traditional approach. This approach seamlessly integrates automated K-Means clustering into the dimension-reduction strategy, augmenting the accuracy and scalability of clustering. By iteratively grouping data points based on their similarity, K-Means captures complex relationships between points, offering enhanced accuracy for datasets with intricate structures. The study emphasizes result comparison through average inter and intra-cluster distances, revealing consistent optimal cluster numbers, reinforcing the efficacy of our approach in achieving auto-clustering with K-Means and reducing computational time. In the clustering domain, our research presents a practical solution for large datasets, showcasing the adaptability of the K-Means algorithm with auto-clustering through PCA. Rigorous comparative analysis establishes its effectiveness for consistent clustering results, positioning it as a potent tool for data analysis in domains grappling with substantial data generation challenges.
Tejas et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: