What question did this study set out to answer?

The study aims to compare the effectiveness of different machine learning algorithms in predicting mental health outcomes from social media data.

June 12, 2026Open Access

Comparative Analysis of Machine Learning Algorithms for Mental Health Prediction Using Social Media Data

Key Points

The study aims to compare the effectiveness of different machine learning algorithms in predicting mental health outcomes from social media data.
Analyzed 26,945 social media posts from Reddit and Twitter
Developed a complete NLP pipeline including tokenization and TF-IDF feature vectorization
Implemented five machine learning algorithms: Logistic Regression, Naive Bayes, SVM, Random Forest, and XGBoost.
XGBoost achieved the highest accuracy of 93.2% with an AUC-ROC of 0.967
Confusion matrix and ROC curve analyses were performed for performance evaluation
Identified ethical considerations including data privacy and algorithmic bias.

Abstract

This paper presents a comparative analysis of five machine learning algorithms — Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest, and XGBoost — for automated mental health prediction using social media text data. Using a combined dataset of 26,945 posts from Reddit and Twitter, a complete NLP pipeline was developed including tokenization, lemmatization, stop-word removal, and TF-IDF feature vectorization. XGBoost achieved the highest accuracy of 93.2% with an AUC-ROC of 0.967. The study includes confusion matrix analysis, ROC curves, cross-validation stability testing, SHAP-based explainability, and a discussion of ethical considerations including data privacy, algorithmic bias, and responsible AI. Limitations and future directions including BERT, Federated Learning, and multimodal analysis are outlined

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper