Oral cancer (OC) remains a major global health concern with survival often limited by late diagnosis. Early and accurate detection is essential to improve patient outcomes and guide treatment decisions. In this study we propose a computer aided diagnostic (CAD) framework for classifying oral squamous cell carcinoma from histopathology images. The model combines Swin transformer for hierarchical feature extraction with vision transformer (ViT) to capture long range dependencies across image regions. SHapley Additive exPlanations (SHAP) based feature selection enhances interpretability by highlighting the most informative features while preprocessing steps such as stain normalization and contrast enhancement improve model generalization and reduce sample variability. Evaluated on a publicly available dataset the framework achieved 99.25% accuracy (ACC) 99.21% sensitivity and a matthews correlation coefficient (MCC) of 98.21% outperforming existing methods. Ablation studies highlighted the importance of positional encoding and statistical analyses confirmed the robustness and reliability of results. To support real-time inference and scalable deployment the proposed model has been integrated into a FastAPI-based web application. This framework offers a powerful interpretable and practical tool for early OC detection and has potential for integration into routine clinical workflows.
Cruze et al. (Mon,) studied this question.