Automatic face emotion recognition, which is otherwise referred to as facial emotion recognition (FER), is a basic feature of ongoing work in computer vision and AI. The principal aim of FER is the emotion recognition involving happiness, sadness, anger, fear, disgust, surprise and neutrality. This review paper presents a comprehensive comparison between 11 popular FER datasets, i.e., CK+, FER-2013, JAFFE, SAVEE, AffectNet, KDEF, RAF-DB, RAVDESS, RFD, Oulu-Casia NIR & VIS and SFEW 2.0. The comparison is conducted across a range of metrics including dataset sizes, emotion classes, data types, age ranges of participants, and emotion distribution. The study contrasts the accuracy of classical methods (e.g., SVM, HOG) against newer deep learning models (e.g., CNN, LSTM) on such datasets, with excellent lab accuracy (e.g., 99.68% with SCNN on CK+) and continued difficulty in the real world (e.g., 61.29% with CNN on SFEW 2.0). The study explains the key challenges thereof, i.e., the skewness of emotions, demographic biases and heterogeneity in the real world. The study mentions shortcomings of existing datasets and approaches, and suggests future research directions in terms of diversity addition of datasets, multimodal fusion, and culturally adaptive model building. Field review strengthens the scientific basis and scope of application of FER and attempts to provide a comprehensive guidebook for researchers and practitioners.
Kayhan et al. (Tue,) studied this question.