What type of study is this?

September 10, 2025

Pedestrian Re-identification Based on Joint Attention Mechanism and Multimodal Features

Key Points

The system achieved a top-1 accuracy of 94.3% on the RegDB dataset, showcasing high effectiveness.
A novel joint attention mechanism enhances pedestrian feature representation by weighing multiple modalities.
Utilizing a residual network backbone, the model innovatively combines RGB, thermal, and depth features.
Optimized with various loss functions, the method suppresses modal discrepancies, improving accuracy in challenging scenarios.

Abstract

To address the challenges of insufficient robustness in single-modal features and interference from cross-modal disparities in pedestrian re-identification under complex scenarios, we propose a novel network model that integrates joint attention mechanisms and multimodal features. Built upon a residual network backbone, the model introduces a cross-modal self-attention module to adaptively weight features from RGB, thermal infrared, and depth modalities. A multimodal feature fusion module is designed with three branches: intra-modal enhancement, cross-modal correlation, and modal discrepancy suppression, which together construct comprehensive pedestrian feature representations. During optimization, we introduce a combination of modal cosine cross-entropy loss, cross-modal triplet loss, center alignment loss, and modal consistency loss, updating the network using a min-max strategy. The proposed method achieves top-1 accuracy rates of 94.3% and 88.7% on the RegDB and SYSU-MM01 datasets, respectively, demonstrating its effectiveness in multimodal pedestrian re-identification scenarios.

AIに質問

Bookmark