What question did this study set out to answer?

The study aims to simultaneously identify optimal clustering and variable reduction to enhance interpretability in data analysis.

March 1, 2026

drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction

Puntos clave

The study aims to simultaneously identify optimal clustering and variable reduction to enhance interpretability in data analysis.
Developed an R package named drclust for clustering and dimensionality reduction.
Utilized double K-means, reduced K-means, and factorial K-means techniques.
Implemented models in C++ for faster execution on large datasets.
Applied disjoint principal components and disjoint factor analysis for interpretability.
Successfully achieved optimal partitioning of units and variable reduction in big data contexts.
Enhanced interpretability of latent variables through specific methodologies.
Produced a sparse loading matrix using disjoint analyses.

Resumen

The primary objective of simultaneous methodologies for clustering and variable reduction is to identify both the optimal partition of units and the optimal subspace of variables, all at once. The optimality is typically determined using least squares or maximum likelihood estimation methods. These simultaneous techniques are particularly useful when working with Big Data, where the reduction (synthesis) is essential for both units and variables. Furthermore, a secondary objective of reducing variables through a subspace is to enhance the interpretability of the latent variables identified by the subspace using specific methodologies. The drclust package implements double K-means (KM), reduced KM, and factorial KM to address the primary objective. KM with disjoint principal components addresses both the primary and secondary objectives, while disjoint principal component analysis and disjoint factor analysis address the latter, producing the sparsest loading matrix. The models are implemented in C++ for faster execution, processing large data matrices in a reasonable amount of time.

Me gusta

Guardar

Me gusta

Guardar

drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction

Puntos clave

Resumen

Cite This Study