March 18, 2024Open Access

Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The remarkable emergence of large language models (LLM) and their vast capabilities have opened a possibility for applications in various fields, including speech emotion recognition (SER). Despite the advancement of SER methods and the abundance of speech data, the requirement of having speech data labeled with emotions is a challenge to fulfill, considering the cost of human annotation. In this study, we propose utilizing LLM to annotate emotional speeches, investigating the use of conversation sequence transcription, and incorporating the textual acoustic feature descriptors into the prompt. Furthermore, we also examine the application of annotation results on emotional speeches as training and augmentation data. Our experiment using the IEMOCAP dataset shows that emotional speech annotation using LLMs can outperform human annotation with possibly lower annotation costs. The SER trained using the annotation result as a whole training data or augmentation data reaches a performance close to state-of-the-art SER methods.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo