Word sense disambiguation (WSD) affects the accuracy of semantic understanding in downstream tasks such as machine translation and cross-lingual information retrieval. To address the disambiguation bottlenecks caused by the lack of cross-lingual semantic constraints in existing monolingual pre-trained models and the superficial nature of traditional bilingual knowledge integration methods, this paper proposes a novel algorithm. First, a multi-domain bilingual parallel corpus is constructed, which undergoes denoising processing, GIZA++-based lexical alignment, and sentence-level alignment optimized by the attention mechanism. Second, a deep cross-lingual knowledge integration framework is designed, where a cross-lingual attention mechanism is embedded into the XLM-R base model. Finally, the model is fine-tuned to adapt to the disambiguation task, outputting the probability distribution of sense categories for the target word. Experiments conducted on the Senseval-3 and SemEval-2013 datasets demonstrate that compared with baseline models such as BERT, RoBERTa, and XLM-R, the proposed algorithm performs particularly prominently in scenarios involving polysemous words and low-resource languages.
Xu et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: