Offline handwritten signature verification (OSV) remains a challenging biometric task owing to the subtle variability of genuine signatures and the sophistication of skilled forgeries. This study introduces a unified benchmarking framework for evaluating eight deep learning architectures—CNN Shallow, CNN Deep, ResNet 18, ResNet 34, MobileNetV2, EfficientNet B0, ViT Tiny, and a CNN–Transformer Hybrid—within a writer-independent Siamese contrastive-learning paradigm. The framework standardizes preprocessing, balanced pair generation, NVIDIA A100 GPU training, and a comprehensive evaluation suite that includes ROC and Precision-Recall curves, Equal Error Rate (EER), calibration analysis, threshold-sensitivity metrics, and embedding visualizations using PCA, t SNE, and UMAP. Experiments conducted on an NVIDIA A100 GPU reveal a clear performance stratification: six architectures achieve perfect verification performance (Accuracy = 1.0, ROC AUC = 1.0, PR AUC = 1.0, EER = 0.0), supported by consistently well separated embedding manifolds and highly stable calibration behavior. In contrast, MobileNetV2 and EfficientNet B0 exhibited elevated EER values and overlapping embeddings, underscoring the limitations of lightweight and compound-scaled models in capturing the fine-grained stroke morphology. The proposed framework establishes a transparent and extensible foundation for future research, enabling fair cross-model comparisons and guiding the development of robust and deployment-ready biometric verification systems. In addition, this study provides the first fully controlled, architecture-agnostic comparison of CNNs, residual networks, lightweight mobile models, and transformer-based architectures under identical, writer-independent conditions. By eliminating variability in preprocessing, pair generation, and training configuration, the framework isolates the true effect of the architectural design on the verification performance. The findings highlight the importance of embedding separability, calibration stability, and threshold robustness—factors often overlooked in prior OSV research but essential for real-world deployment.
Eissa Alreshidi (Thu,) studied this question.