What question did this study set out to answer?

This research aims to enhance subjective text classification by leveraging annotator disagreement for better prediction accuracy.

February 8, 2026Open Access

Learning from Annotator Disagreement Via Weighted Ensemble Optimisation for Subjective Text Classification

Key Points

This research aims to enhance subjective text classification by leveraging annotator disagreement for better prediction accuracy.
Developed a framework called MO-WEL for ensemble learning
Optimised ensemble weights based on F1 score, cross-entropy, and Manhattan distance
Trained candidate predictors using diverse label projections from annotators
Conducted experiments on four benchmark datasets
MO-WEL outperforms strong baseline methods in accuracy
Improves calibration of predictions
Enhances distributional alignment in outputs
Case study shows balance between correctness and minority perspectives

Abstract

Abstract Subjective text classification tasks, such as abuse detection and stance analysis, often suffer from high levels of annotator disagreement. Conventional approaches typically collapse these disagreements into a single ground truth, thereby discarding valuable supervision signals. We propose MO-WEL (Multi-Objective Weighted Ensemble Learning), a novel framework that explicitly leverages annotator disagreement by jointly optimising ensemble weights and size under multiple objectives. Candidate predictors are trained on diverse label projections obtained through random sampling or annotator-specific selection, and ensemble weights are optimised with respect to three complementary losses: F1 score, cross-entropy and Manhattan distance, alongside a regularisation term. Experiments on four benchmark datasets (ConvAbuse, HS-Brexit, MD-Agreement and ArMIS) show that MO-WEL consistently outperforms strong baselines in accuracy, calibration, and distributional alignment. A case study further demonstrates that MO-WEL produces predictions that balance majority correctness with minority annotator perspectives, yielding interpretable and reliable outputs. Our findings highlight the importance of modelling annotator diversity and suggest ensemble optimisation as a principled means of incorporating disagreement into subjective NLP tasks.

KI fragen

Bookmark

View Full Paper