Hindi is the most widely spoken language in India with approximately 528 million native speakers and 600 million total speakers, yet natural language processing (NLP) resources and pre-trained language models for Hindi remain substantially less developed than those for English, limiting the deployment of AI-driven text analysis applications in governance, healthcare, education, and digital commerce in Hindi-speaking markets. Transformer-based pre-trained language models — particularly multilingual BERT (mBERT) and its Hindi-specific variant Hindi-BERT — offer a transfer learning pathway for Hindi NLP tasks, but systematic comparison of fine-tuning strategies, domain generalisation, and performance across classification tasks remains limited in the published literature for Indian language NLP. This paper presents a comprehensive evaluation of fine-tuned Hindi-BERT for two text classification tasks: six-class topic categorisation (politics, sports, entertainment, technology, health, business) and three-class sentiment analysis (positive, negative, neutral) using the newly constructed HindiSentiment-6 corpus — a 12,000-document Hindi text dataset scraped from news portals (Dainik Bhaskar, Amar Ujala, Navbharat Times), social media (Twitter/X Hindi accounts), and e-commerce review platforms (Flipkart, Amazon India). Fine-tuned Hindi-BERT achieves 95.6% accuracy and 95.3% macro-F1 on topic classification and 91.2% accuracy on sentiment analysis across five domains — outperforming TF-IDF+SVM (78.4%), fastText (83.1%), character-level CNN (86.2%), BiLSTM (88.7%), and frozen mBERT (91.3%) baselines. Attention weight visualisation confirms the model captures sentiment-bearing words (rohchak: interesting; prernadayak: inspiring; bekar: useless) as high-attention tokens consistent with human linguistic intuition. The HindiSentiment-6 corpus and fine-tuned model weights are released publicly to support the Hindi NLP research community.
Kavita Shukla, Meenakshi Bisht, Saurabh Dewangan (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: