Purpose The purpose of this paper is to address challenges in named entity recognition (NER) for ancient Chinese texts—such as semantic complexity, ambiguous entity boundaries and syntactic divergence from modern simplified Chinese characters—by proposing GARNET. This robust framework enhances NER accuracy, thereby advancing structured knowledge extraction and digital humanities research. Design/methodology/approach This study integrates a domain-specific pre-trained model (GujiRoBERTa), word-pair relation modelling (W2NER), sliding-window data augmentation and ensemble learning into a new framework GARNET. The W2NER layer uses dilated convolutions and biaffine transformations to model intra-entity structural and semantic relationships. Experiments are conducted on three datasets: Records of the Grand Historian, Twenty-Four Histories and traditional Chinese medicine classics, with fivefold cross-validation, sliding-window data augmentation and ensemble strategies for performance evaluation. Findings GARNET achieves state-of-the-art F1 scores of 85.04% (Records of the Grand Historian), 90.28% (Twenty-Four Histories) and 84.49% (traditional Chinese medicine classics), yielding an overall improvement of 6.18% over the baseline model. Model comparison experiments confirm the contributions of core components: W2NER improves boundary detection with an average F1 gain of 5.73%, while the ensemble strategy reduces prediction bias and stabilizes performance. Furthermore, ablation studies demonstrate the effectiveness of our proposed sliding window data augmentation mechanism for identifying low-resource and low-frequency entities to improve overall recognition performance, achieving an F1 improvement of 3.27%. Originality/value This study pioneers the application of W2NER to NER in ancient Chinese texts, addressing boundary ambiguities through structural analysis. The sliding-window data augmentation mechanism is particularly effective for identifying low-resource and low-frequency entities. The ensemble strategy not only proves effective within the source domain but also successfully transfers its advantages to unseen data. The proposed framework provides a novel solution for extracting structured knowledge from ancient Chinese texts, with implications for historical research and cultural heritage digitization.
Yang et al. (Sat,) studied this question.