Key points are not available for this paper at this time.
In this paper, we propose fully convolutional siamese fusion networks for object tracking. We adopt the fusion strategy of convolutional layers for object tracking to achieve good feature representation based on convolutional neural networks. Specifically, we fuse three convolutional layers of VGGNet based on normalized cross correlation (NCC). First, we use three convolutional layers of VGGNet as the basis for fusion, and reduce the size of the convolutional layers based on a convolution kernel. Then, we resize the convolutional layers to be the same size as the deconvolutional layers for layer fusion. Next, we fuse the three layers based on NCC between the target and search region, and produce the response map. Finally, we get the tracking result from the response map by the maximum response. Various experiments on large-scale data sets verify that the proposed method is robust to occlusion, deformation, motion blur, and background clutter as well as outperforms state-of-the-art trackers in terms of distance precision and overlap success.
Cen et al. (Fri,) studied this question.