What question did this study set out to answer?

The aim is to develop a transformer-based model to effectively suppress attention noise in remote sensing image dehazing.

April 15, 2026Open Access

SDTformer: Scale-Adaptive Differential Transformer Network for Remote Sensing Image Dehazing

Key Points

The aim is to develop a transformer-based model to effectively suppress attention noise in remote sensing image dehazing.
Developed the SDTformer architecture with a differential attention mechanism.
Implemented a scale-adaptive self-attention module to handle varying spatial scales.
Used a dynamic differential feed-forward network for feature selection and aggregation.
Introduced a gated fusion module for aggregating multi-scale features from encoder blocks.
Achieved improved reconstruction fidelity in remote sensing images as compared to traditional methods.
Demonstrated enhanced performance on commonly used benchmarks against state-of-the-art techniques.

Abstract

In Transformer-based image restoration models, the self-attention mechanism often introduces attention noise from irrelevant contextual feature, hindering the recovery of underlying clear content. Although many methods have been proposed to suppress attention noise, we note that most existing approaches are often developed for general vision tasks and fail to generalize across remote sensing image dehazing, where large-scale spatial structures pose additional challenges for attention modeling. How to effectively model scale-aware attention to suppress redundant activations becomes crucial for remote sensing image dehazing. In this paper, we propose a scale-adaptive differential Transformer (SDTformer), an architecture designed to suppress attention noise through a differential attention mechanism, thereby improving reconstruction fidelity. Specifically, the model incorporates a scale-adaptive differential self-attention module, which models contextual dependencies across different spatial scales and reduces redundant contextual interference by computing differential attention maps. Additionally, a dynamic differential feed-forward network is proposed to adaptively select informative spatial features, strengthening feature aggregation. To further enhance feature representation, a gated fusion module is introduced to aggregate multi-scale features generated by different encoder blocks, which facilitates the learning process of each decoder block and improves the final reconstruction performance. Extensive experimental results on the commonly used benchmarks show that our method achieves favorable performance against state-of-the-art approaches.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper