What does this research mean for the field?

A Fusion Multi-modal Transformer (FMT) utilizing bidirectional multi-modal attention effectively fuses complementary semantic information from visible and infrared images, achieving highly accurate person re-identification. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

December 8, 2023

Bidirectional Multi-modal Attention for Visible-Infrared Person Re-Identification

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Because of the rapid development of intelligent monitoring, Person Re-Identification has become one of the most popular research areas. Especially in recent years, with the widespread deployment of RGB-IR Dual mode Cameras, infrared images can assist in Person Re-Identification. These advancements effectively solve the problem that visible cameras cannot effectively capture person images under complex lighting condition, such as at night or low-light environments. Most existing methods focus on extracting modality-shared features and modality-level alignment. In this paper, we propose a Fusion Multi-modal Transformer(FMT) for Visible-Infrared Person Re-Identification. Concretely, Our method uses a Transformer-based feature extractor to generate discriminative features from two modalities. Moreover, the bidirectional multi-modal attention block is introduced to obtain modality-specific information and fuse complementary semantic information for identification. Our experiments have shown that the model has outstanding performance with achieving 99.86% prediction accuracy on SYSUMM01, and achieving 94.13% accuracy on LLCM.

Preguntar a la IA

Me gusta

Guardar