Visual Language Models (VLMs) combine natural language processing and computer vision to interpret multimodal data, such as images and text, showing great potential in image classification applications. This paper investigates the integration of Active Learning (AL) and pseudo-labeling techniques with VLMs to improve image classification in various domains. To achieve this, five AL strategies (Random Sampling, Uncertainty Sampling, Margin Sampling, Entropy Sampling, and Query-by-Committee) and three pseudo-labeling approaches (Direct, Confidence Threshold, and Feature Similarity) were evaluated iteratively. The results demonstrate that the combination of active learning and pseudo-labeling can achieve promising results, in addition to full class coverage in a few iterations. We conclude that the integration of AL with feature similarity-based pseudo-labeling offers a robust and efficient solution for image classification in limited-data scenarios, promoting high accuracy, class representativeness, and the reduction of propagation errors, with potential for applications in critical domains like healthcare and industry.
Amorim et al. (Mon,) studied this question.