Gait recognition is a valuable biometric method for public security and edge intelligence, but current models often struggle to capture fine-grained motion cues and are too computationally heavy for resource-limited devices. To address these challenges, we propose a lightweight framework that combines global spatial features from a residual backbone with local motion refinement via region-aware 3D convolutions. A channel-wise attention mechanism reduces redundancy and enhances discriminative features. Sequence features are aggregated by pooling, and BNNeck is then applied to normalize the pooled embedding, improving training stability and the effectiveness of the classification head. We further introduce an entropy-regularized loss that penalizes high-entropy (uncertain) predictions, encouraging more compact and view-consistent representations without adding model parameters or inference cost. Our method outperforms lightweight baselines on CASIA-B and generalizes well to OUMVLP and CASIA-C, offering high accuracy with a reduced parameter count, making it ideal for edge deployment.
Cheng et al. (Sun,) studied this question.