This article explores how digital innovations – particularly machine learning and natural language processing – can streamline and enhance workflows in historical climatology. Traditionally reliant on time-consuming manual analysis of historical documents, the field now benefits from modern digital tools at each research stage, from source discovery to publication. Focusing on classifying large, unstructured textual data, the study examines methods ranging from manual keyword searches and Bayesian models to advanced large language models. Using the tambora.org corpus, it extracts and categorizes references to weather extremes like thunderstorms and heavy rainfall and their impacts on mobility. The paper compares these approaches in terms of accuracy, resource demands such as runtime performance and memory, and their ability to interpret historical language. It argues that digital methods – especially AI – can transform the extraction and classification of climate data from historical texts, offering significant advantages by assisting researchers in historical climatology.
Kahle et al. (Wed,) studied this question.