January 1, 2023

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition

Key Points

Key points are not available for this paper at this time.

Abstract

Currently, speech emotion recognition models still could not show satisfactory performance due to the complexity of emotions. In most of the previous studies, there is a common problem that some of the particular emotions are severely misclassified. In this article, we propose a novel framework integrating cascaded attention network and adversarial joint loss strategy for speech emotion recognition, aiming at discriminating the confusions by emphasizing more on the emotions which are difficult to be correctly classified. First, we extract log-Mels, deltas and delta-deltas of log-Mels as 3D features to effectively reduce the interference of external factors. Next, we introduce a cascaded attention network to extract effective emotional features, where spatiotemporal attention selectively locates the targeted emotional regions from the input features. In these targeted regions, the self attention with head fusion captures the long-distance dependence of temporal features. Finally, an adversarial joint loss strategy is proposed to distinguish the emotional embeddings with high similarity by the generated hard triplets in an adversarial fashion. To evaluate our proposed method, experiments are performed with the IEMOCAP, CASIA, and EMODB corpora. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art approaches on all datasets.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yang Liu

Haoqin Sun

Wenbo Guan

Journals

IEEE/ACM Transactions on Audio Speech and Language Processing

Actions

Institutions

Chinese Academy of Sciences

Institute of Automation

Qingdao University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study