August 14, 2025Open Access

A Multi-Dimensional Feature Extraction Model Fusing Fractional-Order Fourier Transform and Convolutional Information

Key Points

The FCViT model achieved an impressive accuracy of 93.52% on the standardized fundus glaucoma dataset, significantly enhancing performance.
It reached 94.21% accuracy on the Harvard Dataverse-V1 dataset, indicating robust classification capabilities across diverse datasets.
By employing a three-branch architecture, the model captures both local and global features effectively in medical images.
This approach highlights limitations of traditional vision transformers, offering a promising solution for deep learning challenges in medical imaging.

Abstract

In the field of deep learning, the traditional Vision Transformer (ViT) model has some limitations when dealing with local details and long-range dependencies; especially in the absence of sufficient training data, it is prone to overfitting. Structures such as retinal blood vessels and lesion boundaries have distinct fractal properties in medical images. The Fractional Convolution Vision Transformer (FCViT) model is proposed in this paper, which effectively compensates for the deficiency of ViT in local feature capture by fusing convolutional information. The ability to classify medical images is enhanced by analyzing frequency domain features using fractional-order Fourier transform and capturing global information through a self-attention mechanism. The three-branch architecture enables the model to fully understand the data from multiple perspectives, capturing both local details and global context, which in turn improves classification performance and generalization. The experimental results showed that the FCViT model achieved 93.52% accuracy, 93.32% precision, 92.79% recall, and a 93.04% F1-score on the standardized fundus glaucoma dataset. The accuracy on the Harvard Dataverse-V1 dataset reached 94.21%, with a precision of 93.73%, recall of 93.67%, and F1-score of 93.68%. The FCViT model achieves significant performance gains on a variety of neural network architectures and tasks with different source datasets, demonstrating its effectiveness and utility in the field of deep learning.

A Multi-Dimensional Feature Extraction Model Fusing Fractional-Order Fourier Transform and Convolutional Information

Key Points

Abstract

Cite This Study