What question did this study set out to answer?

The aim is to evaluate different topic modeling methods for improving book index creation.

March 16, 2026

Natural-language processing (NLP) and indexing, Part 2. Topic modeling Howes: Natural-language processing (NLP) and indexing, Part 2

Puntos clave

The aim is to evaluate different topic modeling methods for improving book index creation.
Utilized Latent Dirichlet Allocation (LDA) and BERTopic for topic modeling.
Applied the test corpus of public-domain text 'The cliff ruins of Canyon de Chelly, Arizona.'
Conducted document preprocessing and pipeline construction for both methods.
Employed visualizations to interpret latent topics generated by the models.
Both models produced coherent and interpretable topics.
BERTopic demonstrated richer and more nuanced topic clusters compared to LDA.
LLM-based topic modeling is recommended for better topic discovery in book indexing.

Resumen

This article explores how topic modeling can support book index creation, comparing a classic machine learning method (Latent Dirichlet Allocation or LDA) with a large language model (LLM) approach (BERTopic, which builds on SBERT-derived sentence embeddings). Using the public-domain text ‘The cliff ruins of Canyon de Chelly, Arizona’ as a test corpus, the article covers document preprocessing, pipeline construction, and a suite of visualizations that help interpret the latent topics each model discovers. The results show that both techniques generate coherent, interpretable topics, but BERTopic yields richer, more nuanced topic clusters as it leverages larger text spans and preserves grammatical structure. Consequently, the author recommends LLM-based topic modeling over traditional LDA for topic discovery in book indexing.

Me gusta

Guardar

Me gusta

Guardar

Natural-language processing (NLP) and indexing, Part 2. Topic modeling Howes: Natural-language processing (NLP) and indexing, Part 2

Puntos clave

Resumen

Cite This Study