Single-object tracking has progressed rapidly, yet it remains fragile under low illumination, occlusion, and background clutter. RGB-Thermal (RGB-T) tracking improves robustness via modality complementarity, yet many existing trackers do not dynamically switch the dominant modality as sensing quality changes and often rely on simple late fusion at a single stage, underutilizing multi-level features across the backbone. To address these challenges, we propose CMCLTrack, a unified framework that integrates the Reliability-Modulated Cross-Modal Adapter (RMCA) and the Cross-Layer Mamba Fusion (CLMF). Specifically, RMCA performs reliability-aware bidirectional cross-modal interaction by dynamically weighting modality contributions, while CLMF efficiently aggregates complementary cues from multiple encoder layers to exploit multi-level representations. To stabilize the learning of layer-wise modality reliability, we additionally incorporate a cross-layer reliability smoothness regularization. Extensive experiments on multiple RGB-T tracking benchmarks demonstrate that CMCLTrack achieves competitive performance compared to existing state-of-the-art methods.
Li et al. (Fri,) studied this question.