Key points are not available for this paper at this time.
Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging. Those tasks are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models, specifically the position bias of positive examples in token classification. We propose an evaluation approach to investigate position bias in transformer models with different position embedding techniques. We show that LMs can suffer from this bias with an average drop in performance ranging from 3% to 5%. We propose two methods: Random Position Shifting and Context Perturbation, that we apply on batches during the training process. The results show an improvement of ≈ 2% in the performance of the model on CoNLL03, UDₑn, and TweeBank.
Building similarity graph...
Analyzing shared references across papers
Loading...
Amor et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e6fffcb6db64358767a141 — DOI: https://doi.org/10.1145/3605098.3636126
Mehdi Ben Amor
Michael Granitzer
Jelena Mitrović
University of Passau
Building similarity graph...
Analyzing shared references across papers
Loading...