March 6, 2024

Data Augmentation for Sentiment Analysis Enhancement: A Comparative Study

Key Points

Key points are not available for this paper at this time.

Abstract

For many years, researchers in the field of natural language processing have been exploring sentiment analysis, a method for understanding human feelings and thoughts expressed in text. Sentiment analysis works by first analyzing the sentiment of individual words or phrases, using methods like dictionaries, machine learning, or natural language processing. In machine learning, however, the reliance on training data quality and quantity poses challenges, including data scarcity and imbalanced label distribution. One possible way to increase the distribution of textual data is utilizing data augmentation methods in which the train data samples are artificially transformed to another data with similar context. One effective approach to generating a more extensive and varied textual samples involves utilizing the capabilities of large language models. Thus, this paper introduces a comparative study on evaluating the impact of various augmentation methods namely, Random Deletion, Synonym Replacement, GPT3.5 generation, and Character Swapping. We compare the performance across six deep learning models, namely CNN-LSTM, BI-LSTM, BERT, TCN, Ensemble CNN Bidirectional GRU, and Deep neural network. The experimental results reveal that BERT exhibits significant accuracy improvements across different augmentation methods, showcasing gains of 14% in Random Deletion, 12.9% in Synonym Replacement, and 6.5% in Character Swapping.

Mark Helpful

Bookmark

Relay