ABSTRACT Visual object tracking is widely applied in intelligent transportation systems and visual surveillance systems that serve smart cities, as well as in autonomous vehicles. Existing methods usually utilise a relation‐modelling framework to model the visual object tracking problem, with auxiliary spatial context and temporal information. The spatial context is often extracted by enlarging the target template, which can introduce more background and positional information. The temporal correlation is obtained by associating the search image with previous images. However, due to noise interference, existing methods often partially exploit auxiliary data, leading to underutilisation of spatiotemporal information. To address these issues, we propose a novel and concise tracking framework, uniformly encoding all auxiliary data, including the enlarged target template, previous images, and corresponding target bounding boxes. Specifically, to mitigate the unstable factors introduced by these raw inputs, we propose a spatiotemporal context adaptive encoder, which can adaptively select appropriate information in noisy data. Extensive experiments show that the proposed method achieves state‐of‐the‐art performance on various benchmarks, demonstrating its superiority.
Zhao et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: