What question did this study set out to answer?

The aim is to evaluate how well AI models can classify food and alcohol advertisements compared to human experts.

March 13, 2026Open Access

Evaluating AI models for food and alcohol advertisement classification against human benchmarks

Key Points

The aim is to evaluate how well AI models can classify food and alcohol advertisements compared to human experts.
Annotated 1000 Facebook ads from Belgian brands with input from crowd workers and dieticians.
Compared AI models' classifications with a consensus of dieticians to assess accuracy.
Conducted a bias analysis to evaluate the interpretation of advertisement features by AI models.
GPT-4o and Qwen reached over 90% agreement with dietician consensus for single-option features.
Lower agreement for multiple choice features, but still within the variability of crowd rater responses.
Identified consistent under- and over-detection of advertisement labels by AI models.

Abstract

The growth of food and alcohol marketing on social media creates a need for scalable monitoring methods that go beyond manual processing. This study evaluates whether Large Language Models and Vision-Language Models can recognize advertisements and identify their features in consistence with general public or expert opinion. We collected 1000 Facebook ads from major Belgian brands, and annotated them with 600 crowd workers, three dieticians and four AI models (GPT-4o, Qwen 2.5, Pixtral and Gemma3). Our analysis of the data shows that for single-option advertisement features, like alcohol presence or target group, GPT-4o and Qwen reached agreement with the dietician consensus above 90%, similar to the level of pairwise agreement observed between individual dieticians. Though agreement was lower for multiple choice features, like premium offers and marketing strategies, it was still within the variability observed in crowd raters. The bias analysis revealed how models interpret certain labels, with some being consistently under- or over-detected. Based on these findings, we propose tiered deployment recommendations that distinguish between ad features that MLLMs can already monitor with human-level accuracy, and more complex features requiring expert oversight and taxonomy refinement, like marketing strategies or food categories.

Bookmark

View Full Paper

Cite This Study

Gitu et al. (Wed,) studied this question.

synapsesocial.com/papers/69b3abb202a1e69014cccd8e https://doi.org/https://doi.org/10.1038/s41598-026-42426-x

Bookmark

View Full Paper