A major problem in Natural Language Processing (NLP) is Meaning Conflation Deficiency (MCD), especially in low-resource, morphologically rich languages like Sesotho sa Leboa. In downstream tasks like Word Sense Disambiguation (WSD), traditional word embeddings frequently perform poorly because they are unable to distinguish between a word's numerous senses. To ascertain how well various context-aware and multi-prototype word embedding models—such as ELMo, GPT-2, BERT, Universal Sentence Encoder, and hybrid versions of Doc2Vec and SBERT—resolve MCD, this study examines and assesses them. Standard classification measures (precision, recall, F1-score, and accuracy) as well as clustering-based metrics and visualisation approaches were used to assess the models after they were trained and tested on a sense-annotated Sesotho sa Leboa corpus. According to the results, deep contextual models—in particular, ELMo and GPT-2—perform noticeably better in terms of accuracy and sense separation than static and unsupervised models. With well-separated confusion matrices, ELMo showed excellent interpretability and the highest F1-score (93%) of any model. According to the results, context-aware architecture provides reliable MCD solutions as well as a scalable framework for improving WSD in language applications with limited resources. For future studies on semantic disambiguation in under-represented languages, the work offers fresh standards and perspectives.
Masethe et al. (Thu,) studied this question.