Accurate dating of historical texts is essential for understanding cultural and historical narratives. However, traditional methods, such as paleographic and physical examination, can be subjective, costly, and potentially damaging to manuscripts. This paper introduces a machine learning approach to predicting the authorship dates of historical texts by using named entities — specifically, person and place names — as temporal markers. Using a dataset from Trismegistos, which includes metadata on the earliest and latest possible writing dates, we apply regression models to estimate text origins. While linear models like Lasso and Ridge Regression showed limited success, nonlinear models, including Random Forest, XGBoost, and Neural Networks, performed significantly better, with ensemble methods delivering the best results. The top-performing ensemble model achieved a mean absolute error of 45.7 years, surpassing traditional techniques. This study demonstrates the potential of named entities as temporal indicators and the effectiveness of ensemble learning in capturing complex historical patterns, offering a scalable, non-destructive alternative to traditional methods.
Bandara et al. (Fri,) studied this question.