February 27, 2024Open Access

Approximate auto-clustering after bringing down the dimensionality.

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract In the era of escalating data generation, particularly in sectors like healthcare, the abundance of diverse datasets poses a formidable challenge for effective analysis. Clustering has emerged as a crucial tool in this landscape, grouping data into clusters with shared characteristics. Our paper introduces an innovative approach to efficiently tackle the K-Means problem for large datasets. Leveraging Principle Component Analysis (PCA) for dimensionality reduction, the data is transformed, and the K-Means algorithm is applied in this reduced dimensionality, showcasing significant computational advantages over the traditional approach. This approach seamlessly integrates automated K-Means clustering into the dimension-reduction strategy, augmenting the accuracy and scalability of clustering. By iteratively grouping data points based on their similarity, K-Means captures complex relationships between points, offering enhanced accuracy for datasets with intricate structures. The study emphasizes result comparison through average inter and intra-cluster distances, revealing consistent optimal cluster numbers, reinforcing the efficacy of our approach in achieving auto-clustering with K-Means and reducing computational time. In the clustering domain, our research presents a practical solution for large datasets, showcasing the adaptability of the K-Means algorithm with auto-clustering through PCA. Rigorous comparative analysis establishes its effectiveness for consistent clustering results, positioning it as a potent tool for data analysis in domains grappling with substantial data generation challenges.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper