New technologies in human emotion recognition (HER) have drawn considerable attention to use in the fields of security, intelligent customer service, healthcare, educational, human-robot interaction (HRI), and adaptive system training. To identify human emotions, our model incorporates MobileNetV3, Vision Transformer (ViT), RegNet and SE-ResNeXt into a unique deep ensemble classification structure. A Novel Multi Module Neural Networks (MMNNs) architecture is designed in this research for HER for practical application the main purpose of is to identify the human emotions. An innovative approach to improve the performance of HER by integrating MMNNs with Transfer Learning (TL) to train CNNs is researched. The MMNNs classification model is trained by combining features from four CNN models using feature pooling. The key novelty of the model is the novel DEtection TRansformer (DETR) which enhances the CNN learning block. It consists of a CNN that learns low dimensional feature representation, an encoder decoder transformer and a simple Feed Forward Network (FFN) that outputs the final detection prediction, which ultimately boosts face recognition efficiency and accuracy. The MMNNs results are validated on AffectNet, CK + and a custom-made dataset (CMD) achieving accuracy of 91.07%, 87.03% and 96.98% respectively which is further increased by data augmentation technique to 95.09%, 89.15% and 98.13% respectively.
Zaman et al. (Sat,) studied this question.