What type of study is this?

This is a Quantitative Study study.

September 28, 2025Open Access

Joint Learning for Mask-Aware Facial Expression Recognition Based on Exposed Feature Analysis and Occlusion Feature Enhancement

Key Points

The proposed method achieves improvements in accuracy of up to 2.62% on benchmark datasets.
Using a joint learning framework, the model estimates occlusion and integrates features effectively.
Experiments showed the model's high efficiency, processing images in just 12.4 ms each.
The method enhances facial expression recognition despite occlusions, supporting applications in security and interaction.

Abstract

Facial expression recognition (FER), applied in fields such as interaction and intelligent security, has seen widespread development with the advancement of machine vision technology. However, in natural environments, faces are often obscured by masks, posture, and body parts, leading to incomplete features, which results in poor accuracy of existing facial expression recognition algorithms. Apart from extreme scenarios where facial features are completely blocked, the key information of facial expression features is mostly preserved in most cases, yet insufficient parsing of these features leads to poor recognition results. To address this, we propose a novel joint learning framework that integrates explicit occlusion parsing and feature enhancement. Our model consists of three core modules: a Facial Occlusion Parsing Module (FOPM) for real-time occlusion estimation, an Expression Feature Fusion Module (EFFM) for integrating appearance and geometric features, and a Facial Expression Recognition Module (FERM) for final classification. Extensive experiments under a rigorous and reproducible protocol demonstrate significant improvements of our approach. On the masked facial expression datasets RAF-DB and FER+, our model achieves accuracies of 91.24% and 90.18%, surpassing previous state-of-the-art methods by 2.62% and 0.96%, respectively. Additional evaluation on a real-world masked dataset with diverse mask types further confirms the robustness and generalizability of our method, where it attains an accuracy of 89.75%. Moreover, the model maintains high computational efficiency with an inference time of 12.4 ms per image. By effectively parsing and integrating partially obscured facial features, our approach enables more accurate and robust expression recognition, which is essential for real-world applications in interaction and intelligent security systems.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper