What question did this study set out to answer?

To develop a robust person re-identification framework that overcomes challenges of low-resolution images and limited training data.

April 17, 2026Open Access

WSNet: Person Re-Identification Based on Wavelet Convolution and Assisted by Image Generation at Inference Time

Key Points

To develop a robust person re-identification framework that overcomes challenges of low-resolution images and limited training data.
Utilized a wavelet-convolution-based dual-channel network for feature extraction.
Incorporated cross-attention and gating mechanisms for fusing data.
Employed a Stable Diffusion-based image generation module at inference to create diverse pedestrian views.
Used a Pose2ID-based auxiliary branch for synthesizing identity-preserving images.
Achieved an mAP of 92.1% and Rank-1 accuracy of 96.5% on the Market-1501 dataset.
Attained an mAP of 60.1% and Rank-1 accuracy of 81.2% on the MSMT17 dataset.
Improved mAP by 5.1 and 7.6 percentage points over baseline models for Market-1501 and MSMT17 respectively.

Abstract

In pedestrian re-identification (ReID) tasks, existing models face dual challenges: first, surveillance cameras capture images at long distances with low resolution and blurriness; second, image data suffers from insufficient samples, limited poses, and cross-domain adaptation issues. To address these issues, we propose a wavelet-convolution-based person re-identification framework assisted by a Stable Diffusion-based identity-preserving image generation module used only at inference time. This approach employs a dual-channel wavelet convolutional neural network for multi-scale feature extraction of pedestrian images, combined with cross-attention and gating mechanisms for dynamic data fusion. Additionally, we incorporate a pre-trained Pose2ID-based auxiliary generation branch that synthesizes identity-preserving pedestrian views with diverse poses under human keypoint guidance. These generated views are used only at inference time, where their WSNet features are fused with the feature of the original image to provide pose-complementary representation enhancement. Experiments on the Market-1501 and MSMT17 benchmark datasets demonstrate that our method achieves an mAP of 92.1% and a Rank-1 accuracy of 96.5% on Market-1501, and an mAP of 60.1% and a Rank-1 accuracy of 81.2% on MSMT17, with a WSNet backbone of 2.66 M parameters. Compared with the baseline models, the proposed method improves mAP by 5.1 and 7.6 percentage points on Market-1501 and MSMT17, respectively.

WSNet: Person Re-Identification Based on Wavelet Convolution and Assisted by Image Generation at Inference Time

Key Points

Abstract

Cite This Study