April 8, 2024Open Access

Impact of Position Bias on Language Models in Token Classification

Key Points

Key points are not available for this paper at this time.

Abstract

Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging. Those tasks are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models, specifically the position bias of positive examples in token classification. We propose an evaluation approach to investigate position bias in transformer models with different position embedding techniques. We show that LMs can suffer from this bias with an average drop in performance ranging from 3% to 5%. We propose two methods: Random Position Shifting and Context Perturbation, that we apply on batches during the training process. The results show an improvement of ≈ 2% in the performance of the model on CoNLL03, UDₑn, and TweeBank.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper