Depression is a leading cause of disability worldwide, yet many individuals remain undiagnosed due to stigma, limited access to care, or lack of awareness. The growing use of social media provides a new opportunity for passive mental health screening through natural language processing and machine learning, particularly for low-resource languages such as Arabic that remain underrepresented in the literature. This study develops and evaluates multilingual machine learning models for detecting depression in social media text, using two balanced datasets: an Arabic corpus of 15,000 tweets and an English corpus of 99,590 tweets. The preprocessing pipeline incorporates normalization, negation and intensifier handling, and Chi-Square-based feature selection, with feature representation achieved through Bag-of-Words and TF-IDF. Classifiers including Random Forest, Linear SVM, and RBF-SVM were tested with SMOTE applied to address class imbalance. Results show that the RBF-SVM with TF-IDF consistently outperformed other models, achieving an F1-score of 98% and AUC of 0.996 on Arabic tweets, and an F1-score of 94.2% and AUC of 0.987 on English tweets. These outcomes highlight the impact of high-quality preprocessing, linguistic augmentation, and expert-verified annotations in improving classification performance, particularly for Arabic data. The findings demonstrate that optimized traditional machine learning models can surpass more complex deep learning methods for depression detection, and contribute benchmark datasets and practical methodologies for advancing cross-lingual mental health informatics.
Abdelmoniem Abdelmoniem Helmy (Fri,) studied this question.