Named entity recognition (NER) of ancient Chinese texts is the foundation for their development and utilization. Previous studies have focused on the data-driven methodology which tries to utilize the semantic features of ancient Chinese text. With the continuous accumulation of ancient Chinese linguistic resources and textual data, how to fully utilize the data resource and lexical knowledge related to ancient Chinese text with the help of new-generation information technology, so as to improve the ability of semantic comprehension and achieve good performance of NER, has become a great challenge to be solved. In view of this, this paper proposes a named entity recognition model for ancient Chinese text by fusing explicit feature and implicit feature (NERM), on the basis of extracting the explicit features and implicit features of ancient Chinese texts using a pre-trained model and a multi-head attention mechanism. In this model, the GuwenBERT model is introduced to extract the semantic features of ancient Chinese texts, namely the explicit features. The implicit features include relative positional relations, part-of-speech, and character radicals. The experimental results on the corpus GuNER 2023 show that the proposed model NERM achieves an F1 value of 90.67%, outperforming the existing models. The ablation experimental results show that implicit features provide a modest but meaningful improvement over explicit features, and implicit features can be arranged in order of importance as follows: character radicals, part-of-speech, and relative positional relations.
Liu et al. (Thu,) studied this question.