What type of study is this?

September 10, 2025Open Access

KalmanFormer: Integrating a Deep Motion Model into SORT for Video Multi-Object Tracking

Key Points

KalmanFormer improves tracking accuracy in challenging scenarios like occlusion and nonlinear motion.
It achieves HOTA scores of 66.6 on MOT17 and 63.2 on MOT20, showcasing significant performance gains.
The framework incorporates a transformer-based motion corrector to refine Kalman filter predictions effectively.
Testing on datasets like DanceTrack, MOT17, and MOT20 highlights the framework's strong object association capabilities.

Abstract

This paper presents the study of integrating a deep motion model into simple online and real-time tracking for video multi-object tracking. The tracking-by-detection paradigm faces significant challenges in handling nonlinear motion and occlusions. Although conventional Kalman-filter-based methods such as the SORT are efficient, they suffer from error accumulation because of their linear motion assumption. We propose KalmanFormer, a novel framework that enhances Kalman-filter-based tracking through adaptive motion modeling for video sequences. KalmanFormer consists of three key components. First, the inner-trajectory motion corrector, built upon the transformer architecture, refines Kalman filter predictions by learning nonlinear residuals from historical trajectories, thereby improving adaptability to complex motion patterns in videos. Second, the cross-trajectory attention module captures interobject motion correlations, significantly boosting object association under occlusions. Third, a pseudo-observation generator is integrated to provide neural-based predictions when detections are missing, stabilizing the Kalman filter update process. To validate our approach, we conduct comprehensive evaluations on the video benchmarks DanceTrack, MOT17, and MOT20 to demonstrate its effectiveness in handling complex motion and occlusion. The experimental results on the DanceTrack, MOT17, and MOT20 benchmarks demonstrate that KalmanFormer achieves competitive performance, with HOTA scores of 66.6 on MOT17 and 63.2 on MOT20, and strong identity preservation, IDF1: 82.0% and 80.1%, respectively.

KalmanFormer: Integrating a Deep Motion Model into SORT for Video Multi-Object Tracking

Key Points

Abstract

Cite This Study

Also Consider

Also Consider