Lung cancer remains the leading cause of cancer-related mortality worldwide, with accurate subtype classification being critical for improving patient outcomes. Vision Transformers (ViTs) have recently advanced medical image analysis, but their integration with quantum computing remains underexplored, leaving a gap in leveraging quantum advantages for clinical applications. This study proposes a quantum-enhanced Vision Transformer (QViT) that embeds variational quantum circuits (VQCs) within attention and feed-forward layers to enrich feature learning. To assess robustness, two circuit families, Basic VQC and Quantum Approximate Optimization Algorithm (QAOA) based VQC, were evaluated across two depths, yielding four QViT configurations. The models were analyzed through cancer classification tasks, systematic quantum state tomography (QST) to examine quantum state properties, and noise resilience testing under realistic Noisy Intermediate-Scale Quantum (NISQ) conditions, reflecting current hardware limitations. On 3150 CT (Computed Tomography) images, the QViT-QAOA-D1 configuration achieved 98.52% accuracy with a training time of 674 s, outperforming other configurations. Quantum analyses further showed that Basic VQC provides stronger entanglement but requires deeper, less NISQ-friendly circuits, whereas QAOA maintains high-purity states at shallower depths. These results suggest that QViT provides a practical and scalable framework, where quantum-enhanced representations can improve diagnostic performance while remaining compatible with current hardware constraints. Future work will focus on validation using patient-level datasets and deployment on real quantum hardware to further assess clinical applicability and scalability.
Mustofa et al. (Mon,) studied this question.