The rapid expansion of mobile government (m-Government) platforms in Saudi Arabia has generated large volumes of user feedback, creating an opportunity for systematic, data-driven evaluation of public digital services. This study conducts a large-scale sentiment analysis of Arabic user reviews collected from five major Saudi m-Government applications, Absher Business, Tawakkalna, Sehhaty, Nusuk, and Najiz. A dataset comprising 84,000 reviews was constructed and classified into positive and negative sentiment categories. Five Arabic transformer-based baseline models, AraBERT, ArabicBERT, CAMeLBERT, SaudiBERT, and MARBERT, were evaluated under a unified experimental framework. Among these, SaudiBERT and MARBERT achieved the strongest performance, with MARBERT obtaining an accuracy of 91.2 percent, an F1-score of 0.858, and an AUC of 0.942. Furthermore, parameter-efficient fine-tuning using QLoRA on MARBERT preserved comparable performance (F1 = 0.854) while substantially reducing computational requirements. These findings demonstrate the feasibility of scalable sentiment analysis for evaluating and improving m-Government services.
Thamer Alshammari (Thu,) studied this question.