Target tracking for uncrewed aerial vehicles (UAVs) demands both low-latency, real-time inference and robust, long-term temporal consistency. Current approaches often face a trade-off between efficiency and stability in practice. This tension is particularly pronounced in resource-limited UAV platforms: computationally heavy architectures can exceed onboard processing capacity and energy budgets, whereas overly lightweight models degrade temporal state fidelity—leading to cumulative drift under challenging conditions such as occlusion, motion blur, rapid scale variation, and cluttered backgrounds. To address this challenge, we propose SPM-Track, a lightweight yet temporally consistent tracking framework grounded in explicit state maintenance. It introduces a dual-loop judgment-calibration architecture comprising three coordinated components: (1) the content-aware state encoder, which employs input-gate modulation, selectively models temporal dynamics to suppress noise propagation into the state; (2) the hierarchical state manager enhances robustness against long-term occlusions and appearance variations by coordinating short-term state updates with a long-term reliable snapshot library via dual-path cooperation; (3) the adaptive feature recalibration module applies joint spatial-channel discriminative weighting before response map generation, effectively enhancing target distinctiveness and mitigating background clutter interference. Experiments on UAV123, DTB70, UAVTrack112, and LaSOT show that SPM-Track outperforms lightweight baselines and remains competitive with several Transformer-based trackers, demonstrating a favorable trade-off between edge-deployable efficiency and long-term robustness in UAV-based tracking.
Jin et al. (Sun,) studied this question.