February 21, 2024

Sinhala-English Code-Mixed Language Dataset with Sentiment Annotation

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

In communication, technology has been played a significant role in many ways, and it is an essential part for human life nowadays. The majority of people commonly speak two or more languages for making better communication in the regional level or worldwide. Code-mixing is a practice of mixing words from different languages in multilingual settings. In addition, there is a growing demand for code-mixed sentiment analysis of comments posted by users on social media. Systems are trained for data available in one language only and failed with the data in multiple languages, because of the complexity of mixed data at different levels. However, there are only very few code-mixed data are available to create a model. There are no resources available for Sinhala-English code-mixed language, and it is important for researchers to give attention on sentiment analysis using Sinhala-English mixed language. We present a sentiment-labeled corpus for sentiment analysis of code-mixed Sinhala-English text using comments from You Tube® videos. An annotation setup is used to label and create a Sinhala-English dataset for sentiment analysis and the comments are pre-processed to clean. The entire data set has been divided into three groups: neutral, negative, and positive. In order to demonstrate the insight of the dataset, this study employs five machine learning algorithms on a newly created Sinhala-English dataset and achieved significant accuracy.

Me gusta

Guardar

Cite This Study

Uthpala et al. (Wed,) studied this question.

synapsesocial.com/papers/68e7845cb6db6435876f7392 https://doi.org/https://doi.org/10.1109/icarc61713.2024.10499746

Me gusta

Guardar