Existing Convolutional Neural Network (CNN) based target detection systems rely on manual labeling, which is time-consuming and error-prone, especially in images with low target visibility against complex backgrounds. Manual labeling can miss or mislabel targets, and the availability of labeled images is limited compared to the vast amount of unlabeled data. In order to address the aforementioned issues, this paper proposes a semi-supervised target detection network based on Transformer and contrast learning, using the Transformer mechanism, which combines some attributes of the convolutional network with the Transformer self-attentive global perceptual field to achieve stronger semantic representation and better performance; using a supervised contrast learning strategy, which combines the supervised contrast learning metric and architecture to enhance the discriminative capacity of intra-class diversity and inter-class similarity. A semi-supervised learning technique is used to train the model with partially labeled data and a considerable quantity of unlabeled data, hence enhancing the model's training performance and generalization capability. The project extracts several VOC2007 and VOC2012 datasets for testing, and the proposed method improves AP50 accuracy by 9.4% and AR50 recall by 3.2% compared to the fundamental FasterRCNN method, demonstrating its efficacy.
Li et al. (Mon,) studied this question.