Multi-object Tracking (MOT) is the process of tracking the motion trajectories of multiple objects through continuous frames in real-time video stream. Most MOT methods obtain identities by associating bounding boxes whose scores are higher than an Intersection over Union (IoU) threshold. However, a high IoU value does not necessarily mean that the bounding box association is more accurate when multiple objects cluster together. We reveal that appearance features may play a more important role in bounding box association. To this end, we propose an effective association method based on the geometric similarity of bounding boxes. Apart from this, stable object tracking relies on accurate object detection. We used Partial Convolution (PConv) to enhance the feature extraction capability of the lightweight detector. Our method outperforms the baseline on both the MOT17 and MOT20 datasets, regardless of whether the same or different trackers are used. In terms of all the main MOT metrics: MOTA, IDF1, and HOTA. For MOT17: 80.4 MOTA, 78.0 IDF1, and 63.8 HOTA are achieved.
Ren et al. (Tue,) studied this question.