This paper presents a comparative analysis of five machine learning algorithms — Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest, and XGBoost — for automated mental health prediction using social media text data. Using a combined dataset of 26,945 posts from Reddit and Twitter, a complete NLP pipeline was developed including tokenization, lemmatization, stop-word removal, and TF-IDF feature vectorization. XGBoost achieved the highest accuracy of 93.2% with an AUC-ROC of 0.967. The study includes confusion matrix analysis, ROC curves, cross-validation stability testing, SHAP-based explainability, and a discussion of ethical considerations including data privacy, algorithmic bias, and responsible AI. Limitations and future directions including BERT, Federated Learning, and multimodal analysis are outlined
Diya Prithiani (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: