What does this research mean for the field?

The Vision KAN-Transformer (VKT) model outperforms state-of-the-art models in medical image classification while exhibiting lower computational cost and model complexity. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to develop a hybrid model that improves medical image classification by combining KAN and Transformer architectures.

March 10, 2026

VKT : A Hybrid Vision KAN ‐Transformer Model for Medical Image Classification

Key Points

The aim is to develop a hybrid model that improves medical image classification by combining KAN and Transformer architectures.
Introduced the Vision KAN-Transformer (VKT) architecture with dual branches: ConvKan and LG Attention.
Utilized a convolution module and TCKan module in the ConvKan branch for improved feature interactions.
Implemented Local and Global Attention mechanisms in the LG Attention branch for capturing fine and global information.
VKT consistently outperformed state-of-the-art models across seven public datasets.
Demonstrated lower computational costs compared to previous methods.
Showed enhanced interpretability due to the KAN mechanism.

Abstract

ABSTRACT Medical imaging‐based classification models are crucial for disease diagnosis. While Convolutional Neural Networks (CNNs) and Transformers have led to significant advancements in medical image analysis, both face inherent limitations: CNNs struggle to capture long‐range dependencies, and Transformers require large datasets and high computational resources. To address these challenges, we propose the Vision KAN‐Transformer (VKT), a novel framework that integrates Kolmogorov–Arnold Networks (KAN) with Transformers to enhance feature extraction from medical images. The VKT architecture employs a dual‐branch design, consisting of the ConvKan branch and the Local–Global (LG) Attention branch. The ConvKan branch incorporates a convolution module and a TCKan module, which enhance cross‐channel feature interactions through parallel Token Kan and Channel Kan operations. The LG Attention branch combines Local and Global Attention mechanisms to capture both fine‐grained structures and global semantic information. By fusing the outputs from both branches, VKT leverages the strengths of CNNs in local feature extraction and the power of Transformers in modeling global information, while the KAN mechanism enhances interpretability. Extensive experiments on seven public datasets, including two for cross‐validation, demonstrate that VKT consistently outperforms state‐of‐the‐art models. Furthermore, VKT exhibits lower computational cost and model complexity, making it a promising solution for clinical applications. The code is publicly available at: https://github.com/fyy617/VKT .

AI से पूछें

Bookmark

Cite This Study

He et al. (Sun,) studied this question.

synapsesocial.com/papers/69af95de70916d39fea4de9b https://doi.org/https://doi.org/10.1002/ima.70334

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

AI से पूछें

Bookmark