Abstract One-stream transformer trackers have received widespread attention for their excellent discriminatory ability. However, most of the existing trackers try to mine more information about the target while ignoring the exploitation of the background around it. In this work We propose a single-stream progressive background elimination transformer for target tracking. This model employs a progressive attention mechanism to fully mine the relational dependencies between the target context and the search area. To mitigate interference from similar backgrounds within the same category, we introduce a Residual Target Background Interaction Module. This model enhances the accuracy of the early candidate mechanism by leveraging residual links and similarity calculations, effectively eliminating similar target tokens. Second, we feed the processed tokens into the bidirectional optimal channel select module, which employs optimized attention channels to further eliminate non-informative background elements while preserving critical target-related features. Finally, We develop an autoregressive dynamic template updating mechanism that selectively preserves high-quality tracking tokens from current predictions, these refined tokens subsequently serve as reference templates for frame-to-frame processing. This design enhances representation accuracy for ongoing background suppression while maintaining temporal consistency
Zhang et al. (Sun,) studied this question.