What question did this study set out to answer?

This study aims to compare different deep learning architectures for facial emotion recognition, focusing on CNN, ResNet-18, and MobileNetV2.

May 15, 2026Open Access

Facial Emotion Recognition Using Deep Learning: A Comparative Study of CNN, ResNet-18 and MobileNetV2

Key Points

This study aims to compare different deep learning architectures for facial emotion recognition, focusing on CNN, ResNet-18, and MobileNetV2.
Compared CNN-based architectures ResNet-18 and MobileNetV2 for emotion detection.
Assessed performance across benchmark datasets including FER2013, CK+, RAF-DB, AffectNet, and JAFFE.
Evaluated key metrics such as accuracy, precision, recall, and F1-score.
ResNet-18 demonstrates strong recognition performance but requires more computational resources.
MobileNetV2 shows higher efficiency, making it suitable for real-time applications under resource constraints.
Identified key challenges in FER such as class imbalance and demographic bias impacting accuracy.

Abstract

Facial Emotion Recognition (FER) has emerged as an important research area in artificial intelligence and computer vision due to its wide applications in human–computer interaction, healthcare, surveillance, and affective computing. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs), have significantly improved the ability of FER systems to automatically learn discriminative facial features from image data. This review paper presents a comparative study of CNN-based architectures, with a primary focus on ResNet-18 and MobileNetV2 for FER applications. The study examines their architectural design, computational efficiency, and recognition performance across widely used benchmark datasets such as FER2013, CK+, RAF-DB, AffectNet, and JAFFE. In addition, commonly used evaluation metrics, including accuracy, precision, recall, and F1-score, are discussed. The paper highlights key challenges affecting FER systems, including class imbalance, demographic bias, illumination variation, occlusion, and limited generalization across datasets. Furthermore, emerging research directions such as multimodal learning, domain adaptation, synthetic data generation, explainable AI, and personalized FER models are explored. The review concludes that while ResNet-18 provides strong recognition performance, MobileNetV2 offers a more efficient solution for real-time and resource-constrained applications.

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper