ABSTRACT Vehicle re‐identification (ReID), as a key task in intelligent transportation systems, has a significant impact on tracking target vehicles and developing smart cities. Vehicle ReID captures fine‐grained information by decoupling local features from vehicle images. However, differences in camera viewpoints and poses can lead to substantial misalignment issues. Most existing methods align features using predefined external cues, which is inefficient and requires additional manual annotations. In this paper, we propose feature decoupled re‐identification (FDReID). This model uses ResNet‐50 as the backbone network to extract global features. It also includes a feature decoupling module based on the transformer structure that uses the class attention mechanism to extract fine‐grained vehicle features as a supplement to the global features, and aligns the fine‐grained features based on unsupervised clustering. Different from previous methods which decompose vehicle feature into structured features with the use of extra annotation or partition the feature map into several stripes or grids coarsely, we aim to make model learn to capture the discriminative fine‐grained information from the flexible decomposed features. Compared with existing methods, this model can complete model training in an end‐to‐end manner without introducing additional annotations and additional models. In terms of experiments, compared with the StrongBaseline benchmark model, the mean average precision value of this model on the VeRi‐776 dataset has increased by 7.6%, and the Rank1 and Rank5 indicators have increased by 2.6% and 1.3% respectively.
Wan et al. (Thu,) studied this question.