Abstract Subjective text classification tasks, such as abuse detection and stance analysis, often suffer from high levels of annotator disagreement. Conventional approaches typically collapse these disagreements into a single ground truth, thereby discarding valuable supervision signals. We propose MO-WEL (Multi-Objective Weighted Ensemble Learning), a novel framework that explicitly leverages annotator disagreement by jointly optimising ensemble weights and size under multiple objectives. Candidate predictors are trained on diverse label projections obtained through random sampling or annotator-specific selection, and ensemble weights are optimised with respect to three complementary losses: F1 score, cross-entropy and Manhattan distance, alongside a regularisation term. Experiments on four benchmark datasets (ConvAbuse, HS-Brexit, MD-Agreement and ArMIS) show that MO-WEL consistently outperforms strong baselines in accuracy, calibration, and distributional alignment. A case study further demonstrates that MO-WEL produces predictions that balance majority correctness with minority annotator perspectives, yielding interpretable and reliable outputs. Our findings highlight the importance of modelling annotator diversity and suggest ensemble optimisation as a principled means of incorporating disagreement into subjective NLP tasks.
Cui et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: