Unsupervised domain adaptive object detection methods enhance model robustness in the target domain without requiring target-domain annotations. Despite notable progress, existing methods face two major challenges: 1) insufficient and inefficient learning of holistic feature consistency due to cumbersome pixel-level style matching and semantic discrepancy elimination between domains as well as the overlooking of their collaborative effect, and 2) unreliable learning of category feature compactness caused by poor-quality target-domain samples, inaccurate pseudo-labels and noisy cross-domain contrast paradigms. To address these challenges, we propose a novel Semantic Consistency and Compactness Learning (SCCL) network. For consistency learning, we introduce a Visual Adaptation-guided Semantic Alignment (VSA) module that achieves style matching through simple feature adaptation and incorporates a novel adversarial-free self-supervised method for feature disentanglement. The collaboration between these two aspects enables sufficient and efficient consistency learning. For reliable compactness learning, we develop a plug-and-play Instance Center-Contrastive (ICC) head that, for the first time, comprehensively addresses all three potential causes of unreliable learning through three integrated innovations, concerning sample pseudo-label quality enhancement, reliable sample storage and updating, and a robust sample contrast paradigm. Besides, the mutual reinforcement effect of VSA and ICC simultaneously enhances feature transferability and discriminability. Extensive experiments across four UDA object detection benchmarks with two baselines show that SCCL achieves superior adaptability and robustness.
Liu et al. (Thu,) studied this question.