Facial expressions serve as a fundamental non-verbal channel for conveying human emotions and intentions. This review provides a systematic examination of Convolutional Neural Network (CNN)-based methods for Facial Expression Recognition (FER). To address challenges inherent in real-world scenarios, including occlusions, pose variations, and the subtle nature of expressions, research focus has shifted from standard network architectures to more specialized designs. The paper elaborates on three primary technical paradigms: firstly, attention-based CNN architectures, such as channel and spatial attention mechanisms, which enhance robustness by dynamically focusing on critical facial regions; secondly, multi-modal fusion approaches that compensate for the limitations of relying solely on RGB data by integrating geometric features, thermal imaging, or depth information, proving particularly effective for handling occlusions; and finally, strategies for integrating local and global features, including multi-pathway networks and 3D CNNs, which effectively capture hierarchical information ranging from fine-grained muscle movements to holistic expressive dynamics. Despite continuous methodological innovations, the field continues to grapple with multiple challenges, including data bias, insufficient environmental robustness, difficulties in micro-expression recognition, and computational efficiency demands. Looking ahead, key directions for advancing FER towards more reliable, efficient, and practical deployment involve integrating causal reasoning with symbolic knowledge, developing dynamic adaptive inference frameworks.
Yitong Chen (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: