Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP) that focuses on identifying and classifying proper names such as persons, places, organizations, dates, and other meaningful entities within textual data. Although NER systems have achieved remarkable success for widely studied languages like English, their effectiveness for Indian languages remains limited. Marathi, a prominent Indo-Aryan language written in the Devanagari script, presents unique linguistic complexities including rich morphology, extensive inflection, flexible word order, and the absence of capitalization. These characteristics, along with the lack of large annotated datasets and standardized tools, make the task of Named Entity Recognition particularly challenging. This paper presents a comprehensive discussion of the linguistic and computational issues encountered while developing NER systems for Marathi. It examines the impact of morphological variation, lexical ambiguity, orthographic inconsistencies, data scarcity, and domain variation on NER performance. The study concludes by emphasizing the importance of language-specific modelling, corpus development, and the adoption of advanced deep learning techniques for improving Marathi NER systems.
Rajendra et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: