July 10, 2025

Comparative Study on Fusion Method Based on Multimodal Speech Emotion Recognition of Speech and Text

Key Points

Bimodal audio-text models improved accuracy by over 15% compared to unimodal models, enhancing emotion detection.
Current advanced models achieve over 75% accuracy with relatively short training times, meeting real-time needs.
The study utilized the IEMOCAP dataset for training and evaluation in emotion recognition applications.
Findings indicate that integrating speech and text can optimize NPC interactions, suggesting future gaming advancements.

Abstract

With the rapid development of artificial intelligence and deep learning technologies, emotion recognition has gradually become an important research area in human-computer interaction. However, in the current gaming industry, emotion recognition is rarely utilized to optimize NPC (non-player character) intelligence to enhance immersion. Therefore, this study primarily explores the feasibility of applying multimodal emotion recognition in gaming scenarios, aiming to improve the accuracy of emotion recognition through the combination of speech and textual information, thereby optimizing NPC interactions within games. The study employs the IEMOCAP dataset, integrating audio and textual features, and conducts training and evaluation using various machine learning and deep learning models. Additionally, it compares the accuracy and training speed of several advanced fusion models to investigate whether these technologies can meet the accuracy and real-time requirements for gaming applications. The results reveal that the bimodal audio-text models significantly outperform unimodal models, with an improvement exceeding 15%. Current advanced models achieve an accuracy of over 75% with relatively short training times, preliminarily meeting the requirements for accuracy and real-time application in games.

AIに質問

Bookmark

AIに質問

Bookmark

Comparative Study on Fusion Method Based on Multimodal Speech Emotion Recognition of Speech and Text

Key Points

Abstract

Cite This Study