Key points are not available for this paper at this time.
RGBT tracking as a solution in complex environments has gradually become a research hotspot. The powerful complementarity between RGB and thermal infrared data enables trackers to work 24/7. Existing works usually adopt the symmetric network structure that deploys the identical strategy to mine modalities with different properties, ignoring the heterogeneity among modalities. In this paper, we propose a novel asymmetric global-local mutual integration network via comprehensively considering symmetric structure, heterogeneity-based global association and inter-frame communication. It consists of asymmetric mode-distinguishing parallel structure, cross-modal global-local interaction, and inter-frame monitoring strategy. Specifically, the asymmetric mode-distinguishing parallel structure performs discriminative mining on the information of the two modalities by combining the discount module and the branch cement module, and extracts multi-scale cues through the multi-scale auxiliary module to handle the challenges of scale variation and small-size objects. Then, the global mining module is deployed in the cross-modal global-local interaction section to jointly perform intra-modal and inter-modal global correlation, while acting as the global complement to local feature extraction. Finally, the inter-frame monitoring strategy employs a fast optical flow algorithm to detect inter-frame displacement to assist the network better handle camera and fast object motion. Extensive experiments on GTOT, RGBT234 and LasHeR datasets adequately verify the effectiveness of the proposed network, and further ablation experiments also confirm the efficacy of the asymmetric structure and components.
Mei et al. (Sat,) studied this question.
Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context: