Facial emotion recognition (FER) plays a critical role in most applications in human–computer interaction, psychological analysis, and affective computing to make intelligent systems capable of effectively perceiving the emotions of humans. Current approaches lack sufficient strength against problems like poor feature representation, robustness of facial expression variation, and model generalization. In order to counter such shortcomings, this paper presents a new FER model that integrates the SiaCon-DetNet and HySHO algorithm. The most striking novelty of SiaCon-DetNet is its capacity to combine convolutional feature learning with transformer attention mechanisms in order to make strong detection of fine-grained facial features. Also, the suggested framework are based on its intelligent combination of bio-inspired top optimization and deep learning, resulting in an adaptive and efficient emotion detector. Meanwhile, HySHO dynamically adjusts model parameters to enhance learning convergence and reduce computation overhead. This method in the paper presumes an organized working process with the initial step being face region detection by a Siamese convolutional network and feature enhancement by multi-head self-attention in the detection transformer network. Comparative analysis of its performance indicates the new model shows better performance as compared to all other FER methods with up to 99.20% accuracy on JAFFE database and having very short training periods. Emotion-wise correlation and performance testing also validate the reliability of the proposed framework, with precision, recall, and F1-score consistently between 98–99%.
M et al. (Thu,) studied this question.