Glaucoma is a leading cause of irreversible blindness worldwide, making early and reliable detection essential for effective clinical management. This study proposes a deep hybrid ensemble framework for automated glaucoma detection from retinal fundus images by integrating convolutional and transformer-based architectures. The framework combines EfficientNetB4, Vision Transformer (ViT), and Bidirectional Encoder from Image Transformers (BEiT) to jointly capture local retinal structures and global contextual representations. A feature-level attention-based fusion mechanism is employed to effectively integrate complementary features from these models. Experimental evaluation on the G1020 dataset shows that the proposed framework outperforms individual models and conventional ensemble methods, achieving an accuracy of 98.59%, sensitivity of 96.51%, specificity of 99.45%, and an AUC of 0.9989. To assess generalization capability, the trained model was further validated on the external ACRIMA dataset, where it achieved 96.37% accuracy and an AUC of 0.9881. These results demonstrate that the proposed hybrid architecture provides robust and reliable glaucoma detection across datasets. The framework has the potential to support AI-assisted glaucoma screening and future digital twin-based precision ophthalmology systems.
Kumar et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: