Although Transformer-based trackers have achieved impressive tracking accuracy owing to their strong capability for global context modeling, they still suffer from substantial model complexity and high computational latency. To address these limitations, this paper proposes a lightweight Transformer-based single object tracking method, termed TPTTrack. Specifically, a target-state-guided prompt token is introduced and concatenated with the template and search region features. Constructed from compact target-state information, this token guides cross-region feature interaction toward target-relevant information, thereby enhancing tracking robustness. Furthermore, a hierarchical attention decoupling mechanism is developed to improve shallow feature extraction efficiency and reduce redundant self-attention in deeper layers. In addition, a lightweight autoregressive prediction module is employed for dynamic target-state modeling and efficient state estimation. The results of experiments such as 65.0% AO on GOT-10k and 68.5% precision on LaSOT demonstrate an effective balance between accuracy and efficiency that provides a good trade-off between performance and computational cost.
Zhu et al. (Tue,) studied this question.