Abstract Despite the inherent lack of a ground truth in clustering, a broad consensus is overall acknowledged in defining the concept of cluster in the continuous setting. Conversely, this remains controversial in the presence of categorical data. We propose a novel notion of cluster based on the dual concepts of high frequency and variable association. We show how this concept aligns with the cluster notion provided by modal clustering in the continuous setting, and allows us to borrow and adapt existing operational tools to develop a novel procedure which automatically determines the number of clusters. The method is illustrated on some real data and tested via simulations.
Corsini et al. (Thu,) studied this question.