What question did this study set out to answer?

The research aims to enhance person re-identification in video sequences by integrating spatial and temporal features.

February 14, 2026Open Access

A CNN-RNN Siamese framework with multi-level aggregation for video-based person re-identification

Key Points

The research aims to enhance person re-identification in video sequences by integrating spatial and temporal features.
Developed a compact CNN-GRU architecture for efficient processing.
Incorporated multi-level similarity aggregation to capture detailed features.
Evaluated performance against conventional and Siamese-based methods.
Showed significant improvements in recognition accuracy compared to traditional approaches.
Confirmed the effectiveness of combining spatial and temporal feature extraction.
Demonstrated resource efficiency suitable for deployment in constrained environments.

Abstract

Person re-identification (re-ID) in video sequences is a central task in surveillance and computer vision, yet it continues to present substantial challenges due to occlusion, viewpoint variation, and noisy frames. This study proposes a compact deep learning framework that integrates convolutional features, recurrent temporal modeling, and multi-level similarity aggregation to effectively capture both fine-grained spatial cues and long-range temporal patterns. The framework is deliberately designed as a compact CNN-GRU architecture, thereby avoiding the depth and computational demands of transformer-based backbones while preserving robust recognition capabilities. Experimental evaluations reveal clear advantages over conventional and Siamese-based approaches, confirming the complementary nature of spatial and temporal features and the effectiveness of efficient pooling strategies. These findings indicate that accurate and resource-efficient person re-ID can be achieved through compact architectures, offering practical potential for implementation in real-world, resource-constrained environments.

Bookmark

View Full Paper

Bookmark

View Full Paper

A CNN-RNN Siamese framework with multi-level aggregation for video-based person re-identification

Key Points

Abstract

Cite This Study