What question did this study set out to answer?

The aim is to improve retrieval of pedestrian images across visible and infrared modalities without labeled data.

April 10, 2026

Transporting the Cross-modal Prototypes for Unsupervised Visible-Infrared Person Re-identification

Key Points

The aim is to improve retrieval of pedestrian images across visible and infrared modalities without labeled data.
Implemented an unsupervised learning approach for visible-infrared person re-identification.
Designed a loop iterative training strategy incorporating model training and cross-modality matching.
Used optimal transport methods for selecting matched visible and infrared prototypes.
Applied entropy minimization and uniform label distribution to reduce information loss.
Achieved 69.4% Rank-1 accuracy on SYSU-MM01 benchmark.
Secured 89.4% Rank-1 accuracy on RegDB benchmark without annotations.

Abstract

Unsupervised visible infrared person reidentification (USVI-ReID) is a challenging retrieval task that retrieves cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it difficult to generate reliable cross-modality labels, and the lack of annotations also provides additional difficulties for learning modality-invariant features. To facilitate this unsupervised cross-modal learning, we begin by leveraging the information contained in the cross-modality input and its predicted label. Aiming to minimize information loss, we optimize the model by incorporating entropy minimization, uniform label distribution, and cross-modality matching. In our approach, we design a loop iterative training strategy alternating between model training and cross-modality matching, where a uniform prior guided optimal transport assignment is proposed to select matched visible and infrared prototypes. This matching information is then utilized to minimize the intra- and cross-modality entropy. As a result, our model can gradually self-learn useful information, enabling it to generate discriminative representations for unlabeled cross-modal data. Extensive experimental results on benchmarks demonstrate the effectiveness of our method, e.g., 69.4% and 89.4% of Rank-1 accuracy on SYSU-MM01 and RegDB without any annotations. The code will be released soon.

Bookmark

Transporting the Cross-modal Prototypes for Unsupervised Visible-Infrared Person Re-identification

Key Points

Abstract

Cite This Study