Key points are not available for this paper at this time.
The exceptional performance of Vision Transformer (ViT) in many computer vision tasks has attracted the attention of many computer vision researchers; ViT outperforms prior techniques without requiring architectural changes since it focuses on tight regularization within the MLP head while maintaining its fundamental structure. It does this without requiring explicit feature engineering by effectively capturing global and local interdependence in finger vein images by utilizing ViT's self-attention mechanism. In order to prepare images for direct representation learning, they must first be segmented into patches, projected into sequence embedding, and then processed using the ViT encoder. Compared to conventional CNNs, ViT provides improved feature comprehension, scalability, and generalization. The performance of ViT in biometrics and the adaptability of transformer topologies in many computer vision scenarios demonstrate ViT's revolutionary potential. FV-ViT's success represents a substantial improvement in finger vein identification. The proposed approach outperforms the current methods in terms of recognition performance, with an EER of around 0.025 and an average recognition accuracy of 98.5%, according to the experimental findings utilizing public dataset.
Gurunathan et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: