June 18, 2025Open Access

Development of novel Urdu language part of speech tagging system for improved Urdu text classification

Key Points

Key points are not available for this paper at this time.

Abstract

Humans immediately understand any long text by processing or searching the Part-of-Speech (POS). Optimal AI classifier and a unique POS tagging approach guarantee the success of natural language processing. Improved Urdu text classification depends upon the choice of algorithm and an optimal tagging system. Support Vector Machine (SVM) and its latest hyper-tuned variants are effective classifiers because the decision function or support vectors have training points. For more improved results and novelty different variants and hyperparameters will be used. SVM variants, Random Forest, and an ensemble model are tested on the same corpus dataset with and without the proposed unique POS tagger. The POS tagging is a kind of feature engineering for better results. The corpus dataset is divided into four categories: sports, entertainment, science and technology, and business and economics. The results are dramatically improved by using customized POS taggers. The average accuracy, precision, recall, and F1-score of the SVM variants using POS tagging are above 92%. Random Forest classifiers give over 95% results for all matrices. It gives an average of above 98% results for all four matrices. POS tagging significantly improves the performance of models. There is a special focus on improved POS based text categorization for Quick and lightweight models deployment. Recent emerging discriminative and generative AI algorithms with improved architecture may better utilize POS taggers for more complex natural language text classification. Rule-Based POS Taggers Uses handcrafted linguistic rules and Machine learning based taggers Learns patterns from annotated corpora.

Mark Helpful

Bookmark

Relay

View Full Paper