What question did this study set out to answer?

This survey aims to evaluate self-supervised learning strategies for effective object detection in challenging environments.

April 28, 2026Open Access

Self-supervised learning for object detection in challenging settings: A survey

Read Full Paperexternally

Key Points

This survey aims to evaluate self-supervised learning strategies for effective object detection in challenging environments.
Detailed comparison of self-supervised learning methods for object detection.
Benchmark performance using Faster R-CNN and custom SSL methods on COCO and domain-specific datasets.
Assessment of methods in few-shot scenarios and inference on noisy inputs.
Identified performance enhancement through combining local and global bias approaches.
Evaluated SSL strategies reveal variability in effectiveness related to encoder type.
Highlight the significance of pre-training on domain-specific datasets for improved detection outcomes.

Abstract

Self-supervised learning (SSL) has shown great promise in computer vision, enabling networks to learn meaningful representations from large unlabeled datasets. SSL methods fall into two main categories: instance discrimination and image modeling. While instance discrimination is fundamental to SSL, it was originally designed for classification and may be less effective for downstream tasks that require fine-grained or spatially localized representations. In this focused survey, we study SSL for object detection under challenging practical conditions, with particular emphasis on small object detection, domain shift and few-shot learning. Building upon previous surveys, we not only provide a detailed comparison of SSL strategies, but also assess their effectiveness for object detection using both CNN and ViT-based architectures. Our benchmark is performed fairly by fine-tuning a Faster R-CNN initialized with several exemplary SSL methods ourselves, including object-level Instance Discrimination and Masked Image Modeling methods, on the widely used COCO dataset, as well as on a domain-specific dataset focused on vehicle detection in infrared remote sensing imagery. We also evaluate the impact of pre-training on custom domain-specific datasets, highlighting how some SSL strategies are better suited for handling uncurated data. Furthermore, we assess the methods in few-shot settings and inference on noisy input, revealing important behavioral differences depending on the type of encoder used. Our findings highlight that combining approaches with complementary local and global biases improves performance across the evaluated object detection settings. Overall, this survey provides a practical guide for selecting optimal SSL strategies in different scenarios. • We propose a survey on self-supervised learning for real-world object detection. • In our benchmarks, we pay attention to small object detection performance. • Challenging conditions such as frugal setting or remote sensing data are considered. • The benefits of pre-training on custom domain-specific datasets is assessed. • A road map for selecting appropriate self-supervised learning strategies is provided.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Alina Ciocarlan

Université Paris-Saclay

Sidonie Lefèbvre

Université Paris-Saclay

Sylvie Le Hégarat‐Mascle

Centre National de la Recherche Scientifique

Journals

Computer Vision and Image Understanding

Actions

Institutions

Université Paris-Saclay

Office National d'Études et de Recherches Aérospatiales

Laboratoire des systèmes et applications des technologies de l'information et de l'énergie

Self-supervised learning for object detection in challenging settings: A survey

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study