Facial expression recognition (FER), applied in fields such as interaction and intelligent security, has seen widespread development with the advancement of machine vision technology. However, in natural environments, faces are often obscured by masks, posture, and body parts, leading to incomplete features, which results in poor accuracy of existing facial expression recognition algorithms. Apart from extreme scenarios where facial features are completely blocked, the key information of facial expression features is mostly preserved in most cases, yet insufficient parsing of these features leads to poor recognition results. To address this, we propose a novel joint learning framework that integrates explicit occlusion parsing and feature enhancement. Our model consists of three core modules: a Facial Occlusion Parsing Module (FOPM) for real-time occlusion estimation, an Expression Feature Fusion Module (EFFM) for integrating appearance and geometric features, and a Facial Expression Recognition Module (FERM) for final classification. Extensive experiments under a rigorous and reproducible protocol demonstrate significant improvements of our approach. On the masked facial expression datasets RAF-DB and FER+, our model achieves accuracies of 91.24% and 90.18%, surpassing previous state-of-the-art methods by 2.62% and 0.96%, respectively. Additional evaluation on a real-world masked dataset with diverse mask types further confirms the robustness and generalizability of our method, where it attains an accuracy of 89.75%. Moreover, the model maintains high computational efficiency with an inference time of 12.4 ms per image. By effectively parsing and integrating partially obscured facial features, our approach enables more accurate and robust expression recognition, which is essential for real-world applications in interaction and intelligent security systems.
Hou et al. (Fri,) studied this question.