What question did this study set out to answer?

To develop a reliable mood-state classification model for bipolar disorders using unlabeled social media data.

April 10, 2026Open Access

Two-Stage self-labeling and multi-objective optimization for bipolar mood-state classification

Key Points

To develop a reliable mood-state classification model for bipolar disorders using unlabeled social media data.
Implemented a two-stage framework for mood-state classification.
Used Flan-T5 self-consistency for generating pseudo-labels from unlabeled posts.
Conducted multi-objective optimization to balance macro-F1, worst-class F1, and latency using Bayesian optimization.
Achieved 0.870 accuracy and 0.863 macro-F1 on a clinician-labeled benchmark.
Improved worst-class robustness (Min-F1) from 0.165 to 0.830.
Achieved a 33% reduction in inference latency while optimizing classification metrics.

Abstract

Social-media platforms provide abundant signals related to mood disorders, yet building reliable supervised models is hindered by limited expert annotations and heterogeneous, noisy language. This paper introduces a two-stage framework for mood-state classification (mania, depression, normal) that leverages large-scale unlabeled posts while preserving evaluation rigor on a strictly held-out clinician-labeled benchmark (G^500ₓ₄ₒₓ). In Stage 1, we generate pseudo-labels using a Flan-T5 self-consistency scheme that samples multiple label proposals per post and aggregates them by majority vote to retain high-agreement instances. This yields markedly cleaner supervision, reaching 0. 870 accuracy and 0. 863 macro-F1 on G^500ₓ₄ₒₓ, improving over the strongest labeling baselines (0. 538 accuracy and 0. 446 macro-F1) by +0. 332 and +0. 417 absolute points (+61. 7% and +93. 5%, respectively). Importantly, worst-class robustness (Min-F1) increases from 0. 165 to 0. 830 (+0. 665 absolute; 5. 03, i. e. , +403%), clarifying that the large relative gain is driven by a low baseline Min-F1. In Stage 2, we cast model selection as a multi-objective optimization problem that jointly maximizes macro-F1 and worst-class F1 while minimizing inference latency, and solve it using Bayesian optimization with qEHVI (via BoTorch). The optimized configurations yield +4. 9% macro-F1 and +7. 3% minimum F1 with a 33% latency reduction relative to an untuned baseline (0. 803 macro-F1, 0. 772 Min-F1, latency 138. 6), providing a practical accuracy–efficiency trade-off. To quantify uncertainty and confirm that observed improvements are statistically supported, we perform paired significance analyses on G^500ₓ₄ₒₓ and report 95% bootstrap confidence intervals. Extensive experiments reveal Pareto-optimal solutions that are appropriate for deployment under resource constraints and demonstrate steady improvements across evaluation metrics.

Bookmark

View Full Paper

Cite This Study

Issam Zidi (Tue,) studied this question.

synapsesocial.com/papers/69d8948f6c1944d70ce058af https://doi.org/https://doi.org/10.1007/s44443-026-00691-w

Bookmark

View Full Paper