What question did this study set out to answer?

The aim is to enhance word sense disambiguation (WSD) by integrating bilingual corpus knowledge into language models.

June 4, 2026Open Access

Word Sense Disambiguation Algorithm for Pre-trained Language Models Integrating Bilingual Corpus Knowledge

Key Points

The aim is to enhance word sense disambiguation (WSD) by integrating bilingual corpus knowledge into language models.
Constructed a multi-domain bilingual parallel corpus with denoising and alignment processes.
Designed a deep cross-lingual knowledge integration framework with an attention mechanism.
Fine-tuned the XLM-R model specifically for the WSD task.
The proposed algorithm outperformed baseline models like BERT and RoBERTa in handling polysemous words.
Performance improvements were particularly notable in low-resource language scenarios.

Abstract

Word sense disambiguation (WSD) affects the accuracy of semantic understanding in downstream tasks such as machine translation and cross-lingual information retrieval. To address the disambiguation bottlenecks caused by the lack of cross-lingual semantic constraints in existing monolingual pre-trained models and the superficial nature of traditional bilingual knowledge integration methods, this paper proposes a novel algorithm. First, a multi-domain bilingual parallel corpus is constructed, which undergoes denoising processing, GIZA++-based lexical alignment, and sentence-level alignment optimized by the attention mechanism. Second, a deep cross-lingual knowledge integration framework is designed, where a cross-lingual attention mechanism is embedded into the XLM-R base model. Finally, the model is fine-tuned to adapt to the disambiguation task, outputting the probability distribution of sense categories for the target word. Experiments conducted on the Senseval-3 and SemEval-2013 datasets demonstrate that compared with baseline models such as BERT, RoBERTa, and XLM-R, the proposed algorithm performs particularly prominently in scenarios involving polysemous words and low-resource languages.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Xu et al. (Thu,) studied this question.

synapsesocial.com/papers/6a2116cfd499ed480b16fb2d https://doi.org/https://doi.org/10.1016/j.procs.2026.04.189

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper