Los puntos clave no están disponibles para este artículo en este momento.
In recent years, there has been active research on emotion recognition based on speech data that can be utilized in various platforms. Despite the significant progress in emotion recognition research based on the Korean language within the country, the main issue remains the lack of Korean language databases. Due to the absence of such data, there are cases where overfitting issues arise in models proposed in previous studies. Therefore, this study proposes a ResNet model using the data augmentation with saturation (DA-S) method to improve the performance of speech emotion recognition using the existing model. In this study, the number of data was increased from 5,596 to 11,192 by applying DA-S with the AI-HUB database. Consequently, the proposed model successfully addressed the overfitting issue, resulting in a 31.76% improvement in the accuracy of speech emotion recognition. Furthermore, experiments were conducted using a total of 11,192 data samples, including both the original data and the data with DA-S applied to demonstrate the effects of data augmentation techniques in transforming and expanding data, as well as performance improvements due to the increase in data volume. The result showed that there was a 23.4% improvement when DA-S was applied.
Lee et al. (Sun,) studied this question.