Recently, the convergence of advanced sensor technologies and innovations in artificial intelligence and robotics has highlighted facial emotion recognition (FER) as an essential component of human–computer interaction (HCI). Traditional FER studies based on handcrafted features and shallow machine learning have shown a limited performance, while convolutional neural networks (CNNs) have improved nonlinear emotion pattern analysis but have been constrained by local feature extraction. Vision transformers (ViTs) have addressed this by leveraging global correlations, yet both CNN- and ViT-based single networks often suffer from overfitting, single-network dependency, and information loss in ensemble operations. To overcome these limitations, we propose ArecaNet, an assembled residual enhanced cross-attention network that integrates multiple feature streams without information loss. The framework comprises (i) channel and spatial feature extraction via SCSESResNet, (ii) landmark feature extraction from specialized sub-networks, (iii) iterative fusion through residual enhanced cross-attention, (iv) final emotion classification from the fused representation. Our research introduces a novel approach by integrating pre-trained sub-networks specialized in facial recognition with an attention mechanism and our uniquely designed main network, which is optimized for size reduction and efficient feature extraction. The extracted features are fused through an iterative residual enhanced cross-attention mechanism, which minimizes information loss and preserves complementary representations across networks. This strategy overcomes the limitations of conventional ensemble methods, enabling seamless feature integration and robust recognition. The experimental results show that the proposed ArecaNet achieved accuracies of 97.0% and 97.8% using the public databases, FER-2013 and RAF-DB, which were 4.5% better than the existing state-of-the-art method, PAtt-Lite, for FER-2013 and 2.75% for RAF-DB, and achieved a new state-of-the-art accuracy for each database.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jae Myung Kim
Pohang University of Science and Technology
Gyu Ho Choi
Chosun University
Sensors
Chosun University
Building similarity graph...
Analyzing shared references across papers
Loading...
Kim et al. (Thu,) studied this question.
synapsesocial.com/papers/694023fa2d562116f28fdcb8 — DOI: https://doi.org/10.3390/s25237375