What question did this study set out to answer?

The aim is to develop a robust semi-supervised, domain-adaptive framework for object detection that addresses domain shifts and class imbalances.

March 3, 2026Open Access

SSRT-DETR: Domain-Adaptive Semi-Supervised Detector

Key Points

The aim is to develop a robust semi-supervised, domain-adaptive framework for object detection that addresses domain shifts and class imbalances.
Introduced a mean teacher-student architecture for joint modeling of source and target domains.
Developed Domain-Aware Matching to stabilize cross-domain training assignment processes.
Implemented Class-/Scene-Adaptive Pseudo-Labeling to adjust classification thresholds based on domain variability.
Enhanced matching through teacher-guided decoder-query consistency.
Improved mean Average Precision (mAP@0.5) from 51.0 to 54.3 on Cityscapes to Foggy Cityscapes.
Achieved 67.3 Average Precision (AP) on KITTI to Cityscapes and 64.9 AP from Sim10K to Cityscapes.
Demonstrated performance improvements in rare object categories and challenging weather conditions.

Abstract

Domain-adaptive object detection under set-prediction paradigms remains challenging, as Hungarian matching is sensitive to domain shift and fixed pseudo-label thresholds cannot simultaneously handle class imbalance and scene variability. This paper presents SSRT-DETR, a semi-supervised, domain-adaptive framework built on the real-time detector RT-DETR. We adopt a mean teacher–student architecture with style-transferred images to jointly model source and target domains. To stabilize the assignment process during the early stages of cross-domain training, Domain-Aware Matching (DAM) is formulated to augment the Hungarian matching cost with a teacher-guided decoder-query consistency term. Leveraging the more stable EMA teacher representations, DAM guides early matching toward domain-consistent assignments and is gradually annealed to recover standard matching as training converges. In parallel, we introduce Class-/Scene-Adaptive Pseudo-Labeling (CAP) to address a key limitation of existing DAOD methods that rely on fixed or globally tuned pseudo-label thresholds, which struggle with class imbalance and scene-dependent difficulty under domain shift. CAP leverages per-class confidence statistics and multi-view consistency to adapt classification and IoU thresholds across classes and scenes, while temperature scaling and quality-weighted losses provide soft control over pseudo-label reliability. Experiments on standard benchmarks demonstrate the robustness of SSRT-DETR. On Cityscapes→Foggy Cityscapes, SSRT-DETR improves mAP@0.5 from 51.0 to 54.3. On KITTI→Cityscapes and Sim10K→Cityscapes, it achieves 67.3 AP and 64.9 AP on the car category, respectively, clearly outperforming the RT-DETR baseline while maintaining real-time efficiency. Notably, consistent gains are observed in rare categories and adverse weather scenarios, validating the effectiveness of the proposed DAM and CAP modules.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhang et al. (Sat,) studied this question.

synapsesocial.com/papers/69a67f06f353c071a6f0ad37 https://doi.org/https://doi.org/10.3390/s26051539

Bookmark

View Full Paper