What question did this study set out to answer?

The central aim is to develop an advanced model for recognizing facial attributes with enhanced accuracy.

February 26, 2026Open Access

FaceScanPaliGemma multi-agent vision language models for facial attribute recognition

Key Points

The central aim is to develop an advanced model for recognizing facial attributes with enhanced accuracy.
Developed a multi-agent vision language model called FaceScanPaliGemma.
Utilized four fine-tuned models for specific facial attribute classifications.
Evaluated performance using the FairFace and AffectNet public datasets.
Conducted zero-shot evaluation against other existing models.
Achieved accuracies of 81.1% for race, 95.8% for gender, 80.0% for age group, and 59.4% for emotion classification.
Outperformed other vision language models like OpenAI GPT and Google Gemini in classification tasks.

Abstract

Technologies for recognizing facial attributes such as race, gender, age, and emotion from images of human faces have several applications, including personalized advertising, sentiment analysis, and the study of demographic trends and social behaviors. Analyzing face images and facial expressions presents several challenges due to the complexity of human facial attributes and the diversity in representation. While numerous attempts have been made to improve facial attribute classification performance, there remains a strong demand for enhanced accuracy. In this paper, we propose "FaceScanPaliGemma," a multi-agent vision language model (VLM) system consisting of four fine-tuned Google PaliGemma models, each specialized for a specific facial attribute classification. To evaluate the proposed solution, we used the public "FairFace" and "AffectNet" datasets. The results show high accuracy, reaching up to 81.1%, 95.8%, 80.0%, and 59.4% for race, gender, age group, and emotion classification, respectively, outperforming other VLMs such as OpenAI GPT, Google Gemini, LLaVA, and Google PaliGemma under zero-shot evaluation.

Bookmark

View Full Paper

Bookmark

View Full Paper

FaceScanPaliGemma multi-agent vision language models for facial attribute recognition

Key Points

Abstract

Cite This Study