Person Re-identification (Re-ID) in single-gallery scenarios—where each individual has only one registration image—suffers from severe viewpoint sensitivity due to insufficient pose diversity. This study introduces ViewSynthReID, a pioneering generative augmentation framework that leverages Wan2.2, the latest diffusion-based video generation model, to synthesize complete 360° viewpoint coverage from a single input. The pipeline innovatively employs MediaPipe for automatic frontal pose selection, Hybrid Attention Transformer (HAT) for texture-preserving super-resolution, and diffusion synthesis to create photorealistic multi-pose variants, all seamlessly integrated into the lightweight OSNet backbone for efficient multi-scale feature extraction. On Market-1501, while overall Rank metrics experienced minor degradation from synthetic artifacts (Rank-1: 92.3% → 91.8%), the method delivered targeted gains in challenging viewpoint transitions: 75/3,368 queries (2.2%) showed Rank-1 improvements averaging +12.4%, with 28 cases exceeding +25%. These gains were most pronounced in >90° viewpoint gaps, proving generative synthesis effectively bridges critical pose gaps unattainable through traditional augmentation. For real-world deployment, a production-grade inference pipeline is engineered, combining YOLO26 pedestrian detection with TensorRT-optimized OSNet, achieving 7.20 FPS and 135ms latency on 4K video streams. This system enables practical smart city applications, including real-time crowd monitoring, lost person recovery, and traffic behavior analysis, demonstrating that strategic generative augmentation can transform single-shot Re-ID from research curiosity to deployable surveillance technology.
Ifuku et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: