What question did this study set out to answer?

To develop a facial emotion recognition framework that addresses privacy concerns and enhances explainability for mental health monitoring.

February 28, 2026Open Access

Human-Centered and Quantitative Explainability Evaluation of Facial Emotion Recognition for Trustworthy Mental Health Monitoring

Key Points

To develop a facial emotion recognition framework that addresses privacy concerns and enhances explainability for mental health monitoring.
Implemented a collaborative distributed training approach using lightweight CNNs.
Utilized three datasets: RAF-DB, ExpW, and FER2013 for cross-dataset training.
Employed SHAP for optimizing CNN configurations and enhancing interpretability.
Created a multidimensional evaluation framework combining qualitative and quantitative metrics.
Achieved a mean accuracy of 74.3% in cross-dataset generalization.
Demonstrated alignment between quantitative GEQS and user trust metrics.
Showed practical deployment feasibility with a Raspberry Pi 4 achieving 60 ms inference time.

Abstract

Facial emotion recognition (FER) systems can serve as a valuable non-invasive tool for assessing emotional states linked to mental health. However, two main issues hinder their adoption in clinical settings, including privacy concerns inherent to centralized data processing and the lack of transparent decision-making processes. This paper proposes a privacy-preserving and explainable FER framework that implements a collaborative distributed training approach for lightweight convolutional neural network (CNN) architectures across three heterogeneous datasets: RAF-DB, ExpW, and FER2013. SHapley Additive exPlanations (SHAP) guided the optimization of CNN filter configurations, prioritizing high accuracy, cross-dataset generalization, and interpretable, trustworthy explanations. A multidimensional explainability evaluation framework is developed that combines perturbation-based faithfulness and feature localization metrics into a Global Explanation Quality Score (GEQS) for quantitative assessment of explanation quality. A qualitative user study was conducted to assess alignment between human perception and quantitative explainability metrics. Guided by explainability-driven evaluation, a lightweight CNN achieves 74.3% mean accuracy, highlighting its effectiveness for cross-dataset generalization. Results demonstrate alignment between quantitative GEQS and human evaluation, with both identifying the same model architecture. However, qualitative analysis shows that highlighting emotion-relevant facial features does not always ensure user trust. The practical viability of the proposed FER system in resource-constrained clinical environments is demonstrated through implementation on Raspberry Pi 4 with integrated SHAP explainability, achieving 60 ms inference time.

Bookmark

View Full Paper

Bookmark

View Full Paper

Human-Centered and Quantitative Explainability Evaluation of Facial Emotion Recognition for Trustworthy Mental Health Monitoring

Key Points

Abstract

Cite This Study