ABSTRACT Medical imaging‐based classification models are crucial for disease diagnosis. While Convolutional Neural Networks (CNNs) and Transformers have led to significant advancements in medical image analysis, both face inherent limitations: CNNs struggle to capture long‐range dependencies, and Transformers require large datasets and high computational resources. To address these challenges, we propose the Vision KAN‐Transformer (VKT), a novel framework that integrates Kolmogorov–Arnold Networks (KAN) with Transformers to enhance feature extraction from medical images. The VKT architecture employs a dual‐branch design, consisting of the ConvKan branch and the Local–Global (LG) Attention branch. The ConvKan branch incorporates a convolution module and a TCKan module, which enhance cross‐channel feature interactions through parallel Token Kan and Channel Kan operations. The LG Attention branch combines Local and Global Attention mechanisms to capture both fine‐grained structures and global semantic information. By fusing the outputs from both branches, VKT leverages the strengths of CNNs in local feature extraction and the power of Transformers in modeling global information, while the KAN mechanism enhances interpretability. Extensive experiments on seven public datasets, including two for cross‐validation, demonstrate that VKT consistently outperforms state‐of‐the‐art models. Furthermore, VKT exhibits lower computational cost and model complexity, making it a promising solution for clinical applications. The code is publicly available at: https://github.com/fyy617/VKT .
He et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: