August 2, 2024Open Access

Mapping the Unseen in Practice: Comparing Latent Dirichlet Allocation and BERTopic for Navigating Topic Spaces

Key Points

Key points are not available for this paper at this time.

Abstract

This article focuses on the strengths and weaknesses of topic modeling for the social studies of science. For about a decade, Natural Language Processing opened new research avenues beyond traditional bibliometric approaches, such as co-citation, co-authorship, and co-word analysis. Among these, the most prevalent are Latent Dirichlet Allocation (LDA) and BERTopic. The first is a Bayesian probabilistic model and the latter is rooted in deep learning. It remains unclear what those differences imply in practice, and how they contribute to our sociological understanding of the inner works of science. This paper compares results obtained by LDA and BERTopic applied to the same dataset composed of all scientific articles (n=34,797) authored by all biology professors in Switzerland between 2008 and 2020. Although they differ in their operationalization, LDA and BERTopic produce topic spaces with a similar global configuration. However, major differences are observed when focusing on specific multidimensional concepts, such as gene or species. Overall, we stress that topic modeling offers a highly valuable ground for collaborative interdisciplinary research among scholars from all the social studies of science and beyond, when combined with in-depth knowledge of the object under scrutiny.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper