With the rapid development of artificial intelligence and deep learning technologies, emotion recognition has gradually become an important research area in human-computer interaction. However, in the current gaming industry, emotion recognition is rarely utilized to optimize NPC (non-player character) intelligence to enhance immersion. Therefore, this study primarily explores the feasibility of applying multimodal emotion recognition in gaming scenarios, aiming to improve the accuracy of emotion recognition through the combination of speech and textual information, thereby optimizing NPC interactions within games. The study employs the IEMOCAP dataset, integrating audio and textual features, and conducts training and evaluation using various machine learning and deep learning models. Additionally, it compares the accuracy and training speed of several advanced fusion models to investigate whether these technologies can meet the accuracy and real-time requirements for gaming applications. The results reveal that the bimodal audio-text models significantly outperform unimodal models, with an improvement exceeding 15%. Current advanced models achieve an accuracy of over 75% with relatively short training times, preliminarily meeting the requirements for accuracy and real-time application in games.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xinheng Xie
Transactions on Computer Science and Intelligent Systems Research
Building similarity graph...
Analyzing shared references across papers
Loading...
Xinheng Xie (Thu,) studied this question.
www.synapsesocial.com/papers/68af55ccad7bf08b1eadc211 — DOI: https://doi.org/10.62051/qadyms14
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: