What question did this study set out to answer?

To address viewpoint sensitivity in person re-identification using a new generative framework for data augmentation.

April 5, 2026Open Access

Real-Time Person Re-Identification Using Image Generation-Based Data Augmentation

Key Points

To address viewpoint sensitivity in person re-identification using a new generative framework for data augmentation.
Introduced ViewSynthReID framework using Wan2.2 diffusion model for viewpoint synthesis.
Employed MediaPipe for automated frontal pose selection.
Utilized Hybrid Attention Transformer (HAT) for texture preservation.
Integrated the system into OSNet for efficient feature extraction.
Engineered an inference pipeline with YOLO26 and TensorRT for real-time applications.
Rank-1 performance showed a minor decrease from 92.3% to 91.8% due to synthetic artifacts.
Targeted Rank-1 improvements of +12.4% on 2.2% of challenging viewpoint transition queries.
Measured improvements were most significant for gaps greater than 90° viewpoint.

Abstract

Person Re-identification (Re-ID) in single-gallery scenarios—where each individual has only one registration image—suffers from severe viewpoint sensitivity due to insufficient pose diversity. This study introduces ViewSynthReID, a pioneering generative augmentation framework that leverages Wan2.2, the latest diffusion-based video generation model, to synthesize complete 360° viewpoint coverage from a single input. The pipeline innovatively employs MediaPipe for automatic frontal pose selection, Hybrid Attention Transformer (HAT) for texture-preserving super-resolution, and diffusion synthesis to create photorealistic multi-pose variants, all seamlessly integrated into the lightweight OSNet backbone for efficient multi-scale feature extraction. On Market-1501, while overall Rank metrics experienced minor degradation from synthetic artifacts (Rank-1: 92.3% → 91.8%), the method delivered targeted gains in challenging viewpoint transitions: 75/3,368 queries (2.2%) showed Rank-1 improvements averaging +12.4%, with 28 cases exceeding +25%. These gains were most pronounced in >90° viewpoint gaps, proving generative synthesis effectively bridges critical pose gaps unattainable through traditional augmentation. For real-world deployment, a production-grade inference pipeline is engineered, combining YOLO26 pedestrian detection with TensorRT-optimized OSNet, achieving 7.20 FPS and 135ms latency on 4K video streams. This system enables practical smart city applications, including real-time crowd monitoring, lost person recovery, and traffic behavior analysis, demonstrating that strategic generative augmentation can transform single-shot Re-ID from research curiosity to deployable surveillance technology.

KI fragen

Bookmark

View Full Paper