Los puntos clave no están disponibles para este artículo en este momento.
Topic model evaluation, like evaluation of other unsupervised methods, can be. However, the field has coalesced around automated estimates of coherence, which rely on the frequency of word co-occurrences in a corpus. Contemporary neural topic models surpass classical ones to these metrics. At the same time, topic model evaluation suffers a validation gap: automated coherence, developed for classical models, has been validated using human experimentation for neural models. In addition, meta-analysis of topic modeling literature reveals a substantial gap in automated topic modeling benchmarks. To address the gap, we compare automated coherence with the two most widely human judgment tasks: topic rating and word intrusion. To address the gap, we systematically evaluate a dominant classical model and state-of-the-art neural models on two commonly used datasets. Automated declare a winning model when corresponding human evaluations do, calling into question the validity of fully automatic evaluations of human judgments.
Hoyle et al. (Mon,) studied this question.