With the advent of deep generative models, there has been some recent interest in the manipulation of people’s facial features. This has many potential applications in fashion and biometrics. However, it is a complex task. Indeed, a modification of a given attribute should not have any effect on the others, identity should be preserved, and image quality should not be altered. So far, the evaluation of the proposed methods has been mostly qualitative, which is insufficient to demonstrate progress and performance. We propose a comprehensive evaluation framework to estimate the quality of facial attribute editing methods with respect to several criteria: image quality, effective modification of the targeted attribute, level of entanglement between attributes and identity preservation. Three generative models are used to demonstrate the proposed evaluation framework over three datasets and three editing methods, resulting in the analysis of over 29k generated images. • We propose a standardized evaluation framework for face-editing models, addressing image quality, identity preservation, and attribute disentanglement. The framework integrates full-reference (SSIM, LPIPS, FID) and no-reference (DiffQA(R)-AI-KD) evaluation methods, along with identity preservation and attribute entanglement analysis. • Additionally, we introduce metrics to quantify identity preservation, facial landmarks deformation, and entanglement between attributes, enabling a comprehensive assessment of generative face-editing models. • We applied our proposed framework to evaluate three models : StarGAN, VecGAN, and DiffAE, it reveals that stronger attribute edits often increase entanglement and reduce identity preservation, highlighting key areas for improvement in future models. • Our experiments highlight the impact of biased training data on attribute entanglement. For instance, CelebA, composed of celebrity faces, exhibits demographic skew, with age and gender biases. These biases affect face-editing models, limiting accuracy and generalization across underrepresented groups.
Bour et al. (Fri,) studied this question.