This paper investigates the problem of cross-lingual named entity recognition (NER), which involves automatically identifying entities such as persons, organizations, locations, and other structured elements in text. High-quality NER typically requires manually annotated corpora; however, for many low-resource languages, such data are scarce and costly to produce. The study addresses the following question: can annotated sentences in one language be used to transfer NER markup to their machine-translated counterparts in other languages? To explore this, we propose an approach based on a large language model (LLM) that performs two tasks simultaneously: translating a source sentence and generating BIOES-formatted entity tags for the translated output. To improve robustness and reduce semantic drift, a back-translation step is incorporated to verify meaning preservation by comparing the reconstructed source sentence with the original. The proposed method is compared with two baseline approaches: (1) annotation projection via machine translation and (2) automatic tagging using pre-existing NER tools. Performance is evaluated using standard metrics, including precision, recall, and F1-score. Experimental results demonstrate that the LLM-based approach provides a practical and efficient mechanism for transferring NER annotations across languages. While the method achieves strong and balanced performance, its quality remains influenced by translation accuracy and adherence to annotation constraints. Methodologically, the approach can be considered relatively language-independent, as it relies on general LLM capabilities, a universal tagging scheme, and multilingual semantic representations rather than language-specific model training.
Barakhnin et al. (Thu,) studied this question.