What question did this study set out to answer?

The aim is to improve named entity recognition (NER) performance for ancient Chinese texts by fusing explicit and implicit features.

May 30, 2026Open Access

Research on Named Entity Recognition of Ancient Chinese Text by Fusing Explicit Features and Implicit Features

Key Points

The aim is to improve named entity recognition (NER) performance for ancient Chinese texts by fusing explicit and implicit features.
Developed a named entity recognition model (NERM) utilizing explicit and implicit feature extraction.
Used the GuwenBERT model to extract semantic features from ancient Chinese texts.
Employed a multi-head attention mechanism for enhancing feature representation.
The model achieved an F1 value of 90.67%, outperforming existing models.
Implicit features were shown to enhance NER performance, with character radicals being the most significant.
Ablation results ranked implicit features in importance: character radicals, part-of-speech, and relative positional relations.

Abstract

Named entity recognition (NER) of ancient Chinese texts is the foundation for their development and utilization. Previous studies have focused on the data-driven methodology which tries to utilize the semantic features of ancient Chinese text. With the continuous accumulation of ancient Chinese linguistic resources and textual data, how to fully utilize the data resource and lexical knowledge related to ancient Chinese text with the help of new-generation information technology, so as to improve the ability of semantic comprehension and achieve good performance of NER, has become a great challenge to be solved. In view of this, this paper proposes a named entity recognition model for ancient Chinese text by fusing explicit feature and implicit feature (NERM), on the basis of extracting the explicit features and implicit features of ancient Chinese texts using a pre-trained model and a multi-head attention mechanism. In this model, the GuwenBERT model is introduced to extract the semantic features of ancient Chinese texts, namely the explicit features. The implicit features include relative positional relations, part-of-speech, and character radicals. The experimental results on the corpus GuNER 2023 show that the proposed model NERM achieves an F1 value of 90.67%, outperforming the existing models. The ablation experimental results show that implicit features provide a modest but meaningful improvement over explicit features, and implicit features can be arranged in order of importance as follows: character radicals, part-of-speech, and relative positional relations.

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper