What question did this study set out to answer?

This research aims to determine if annotated sentences in one language can be used to transfer named entity recognition markup to machine-translated sentences in other languages.

May 9, 2026Open Access

Cross-Lingual Transfer of Named Entity Markup with Large Language Models

Key Points

This research aims to determine if annotated sentences in one language can be used to transfer named entity recognition markup to machine-translated sentences in other languages.
Proposed a method based on a large language model that translates sentences and generates entity tags simultaneously.
Incorporated a back-translation step to verify meaning preservation between the original and reconstructed sentences.
Compared performance with baseline methods such as annotation projection via machine translation and automatic tagging using existing NER tools.
The LLM-based approach achieved strong and balanced performance in transferring NER annotations.
Evaluation metrics included precision, recall, and F1-score, with significant improvements over baseline methods.
Translation accuracy and adherence to annotation constraints influenced the quality of NER tagging.

Abstract

This paper investigates the problem of cross-lingual named entity recognition (NER), which involves automatically identifying entities such as persons, organizations, locations, and other structured elements in text. High-quality NER typically requires manually annotated corpora; however, for many low-resource languages, such data are scarce and costly to produce. The study addresses the following question: can annotated sentences in one language be used to transfer NER markup to their machine-translated counterparts in other languages? To explore this, we propose an approach based on a large language model (LLM) that performs two tasks simultaneously: translating a source sentence and generating BIOES-formatted entity tags for the translated output. To improve robustness and reduce semantic drift, a back-translation step is incorporated to verify meaning preservation by comparing the reconstructed source sentence with the original. The proposed method is compared with two baseline approaches: (1) annotation projection via machine translation and (2) automatic tagging using pre-existing NER tools. Performance is evaluated using standard metrics, including precision, recall, and F1-score. Experimental results demonstrate that the LLM-based approach provides a practical and efficient mechanism for transferring NER annotations across languages. While the method achieves strong and balanced performance, its quality remains influenced by translation accuracy and adherence to annotation constraints. Methodologically, the approach can be considered relatively language-independent, as it relies on general LLM capabilities, a universal tagging scheme, and multilingual semantic representations rather than language-specific model training.

Cross-Lingual Transfer of Named Entity Markup with Large Language Models

Key Points

Abstract

Cite This Study