Domain-adaptive object detection under set-prediction paradigms remains challenging, as Hungarian matching is sensitive to domain shift and fixed pseudo-label thresholds cannot simultaneously handle class imbalance and scene variability. This paper presents SSRT-DETR, a semi-supervised, domain-adaptive framework built on the real-time detector RT-DETR. We adopt a mean teacher–student architecture with style-transferred images to jointly model source and target domains. To stabilize the assignment process during the early stages of cross-domain training, Domain-Aware Matching (DAM) is formulated to augment the Hungarian matching cost with a teacher-guided decoder-query consistency term. Leveraging the more stable EMA teacher representations, DAM guides early matching toward domain-consistent assignments and is gradually annealed to recover standard matching as training converges. In parallel, we introduce Class-/Scene-Adaptive Pseudo-Labeling (CAP) to address a key limitation of existing DAOD methods that rely on fixed or globally tuned pseudo-label thresholds, which struggle with class imbalance and scene-dependent difficulty under domain shift. CAP leverages per-class confidence statistics and multi-view consistency to adapt classification and IoU thresholds across classes and scenes, while temperature scaling and quality-weighted losses provide soft control over pseudo-label reliability. Experiments on standard benchmarks demonstrate the robustness of SSRT-DETR. On Cityscapes→Foggy Cityscapes, SSRT-DETR improves mAP@0.5 from 51.0 to 54.3. On KITTI→Cityscapes and Sim10K→Cityscapes, it achieves 67.3 AP and 64.9 AP on the car category, respectively, clearly outperforming the RT-DETR baseline while maintaining real-time efficiency. Notably, consistent gains are observed in rare categories and adverse weather scenarios, validating the effectiveness of the proposed DAM and CAP modules.
Zhang et al. (Sat,) studied this question.