What question did this study set out to answer?

The central aim is to enhance remote sensing scene classification by effectively utilizing frequency domain and gradient prior information.

February 8, 2026Open Access

Frequency Domain and Gradient-Spatial Multi-Scale Swin KANsformer for Remote Sensing Scene Classification

Key Points

The central aim is to enhance remote sensing scene classification by effectively utilizing frequency domain and gradient prior information.
Proposes FG-Swin KANsformer model integrating frequency and gradient information.
Employs Discrete Cosine Transform (DCT) module for extracting frequency domain features.
Uses gradient-spatial feature extraction (GSFE) module to capture multi-scale spatial features.
Replaces MLP in Swin Transformer with Kolmogorov–Arnold Network (KAN) for better nonlinear modeling.
FG-Swin KANsformer shows improved performance on three different remote sensing datasets.
Enhancements in feature discrimination lead to better classification accuracy.
Demonstrated ability to integrate global and detailed image information effectively.

Abstract

Transformer-based deep learning techniques have recently shown outstanding potential in remote sensing scene classification (RSSC), benefiting from their ability to capture global semantic relationships and contextual dependencies. However, effectively utilizing the raw image and global semantic information while simultaneously taking into account detailed features and multi-scale spatial relationships remains a major challenge. Therefore, this paper proposes a novel FG-Swin KANsformer model that integrates frequency domain and gradient prior information from raw images with the Kolmogorov–Arnold Network (KAN) to enhance nonlinear feature modeling. The FG-Swin KANsformer consists of three key components: the Discrete Cosine Transform (DCT) module, the gradient-spatial feature extraction (GSFE) module, and the Swin Transformer module integrated with KAN. In the feature embedding phase, the DCT module extracts frequency domain features, while the GSFE module uses multi-scale convolutions and Sobel operators to extract spatial structures and gradient information at different scales, thereby enhancing the utilization of the original image’s frequency domain and gradient prior information. In the Swin Transformer feature modeling phase, the conventional multilayer perceptron (MLP) in Swin Transformer Blocks is replaced by KAN, which decomposes complex multivariate functions into univariate compositions, thereby improving nonlinear representation capacity and enhancing feature discrimination. The thorough experiments on three distinct public remote sensing (RS) datasets demonstrate that FG-Swin KANsformer exhibits outstanding performance.

Bookmark

View Full Paper

Cite This Study

Zhu et al. (Thu,) studied this question.

synapsesocial.com/papers/698828eb0fc35cd7a8848d7e https://doi.org/https://doi.org/10.3390/rs18030517

Bookmark

View Full Paper