In pedestrian re-identification (ReID) tasks, existing models face dual challenges: first, surveillance cameras capture images at long distances with low resolution and blurriness; second, image data suffers from insufficient samples, limited poses, and cross-domain adaptation issues. To address these issues, we propose a wavelet-convolution-based person re-identification framework assisted by a Stable Diffusion-based identity-preserving image generation module used only at inference time. This approach employs a dual-channel wavelet convolutional neural network for multi-scale feature extraction of pedestrian images, combined with cross-attention and gating mechanisms for dynamic data fusion. Additionally, we incorporate a pre-trained Pose2ID-based auxiliary generation branch that synthesizes identity-preserving pedestrian views with diverse poses under human keypoint guidance. These generated views are used only at inference time, where their WSNet features are fused with the feature of the original image to provide pose-complementary representation enhancement. Experiments on the Market-1501 and MSMT17 benchmark datasets demonstrate that our method achieves an mAP of 92.1% and a Rank-1 accuracy of 96.5% on Market-1501, and an mAP of 60.1% and a Rank-1 accuracy of 81.2% on MSMT17, with a WSNet backbone of 2.66 M parameters. Compared with the baseline models, the proposed method improves mAP by 5.1 and 7.6 percentage points on Market-1501 and MSMT17, respectively.
Xie et al. (Wed,) studied this question.