What question did this study set out to answer?

The central aim is to develop an efficient super-resolution model that enhances image quality while reducing computational cost.

May 13, 2026Open Access

DTKD: Diffusion-to-Transformer Heterogeneous Knowledge Distillation for Efficient and Perceptually Enhanced Super-Resolution

Key Points

The central aim is to develop an efficient super-resolution model that enhances image quality while reducing computational cost.
Proposed DTKD framework using diffusion-to-transformer knowledge distillation
Introduced frequency-group-aware distillation loss with discrete wavelet transform
Adopted progressive scheduling strategy for distillation weight adjustment
DTKD improves perceptual quality compared to standalone transformer models
Maintains transformer-level inference efficiency
Ablation studies highlight effective frequency decomposition and progressive scheduling's importance

Abstract

Single-image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs and remains fundamentally ill-posed due to the inherent ambiguity of missing high-frequency details. While diffusion-based SR models achieve superior perceptual quality through iterative denoising, their multi-step sampling process results in substantial computational cost and latency. In contrast, transformer-based SR models offer efficient single-forward inference but are typically optimized for distortion-oriented objectives, limiting perceptual realism. In this paper, we propose DTKD, a diffusion-to-transformer heterogeneous knowledge distillation framework that transfers the perceptual prior of a diffusion teacher into an efficient transformer student. To effectively bridge the representational gap between generative diffusion outputs and deterministic transformer reconstructions, we introduce a frequency-group-aware distillation loss based on two-level discrete wavelet transform (DWT). The loss decomposes images into structured frequency sub-bands and assigns non-uniform weights to emphasize discrepancy-sensitive mid-frequency components. Furthermore, we adopt a progressive scheduling strategy that gradually increases the distillation weight during training to stabilize optimization and balance structural fidelity with perceptual enhancement. Extensive experiments on real-world SR benchmarks demonstrate that the proposed framework consistently improves perceptual quality over a standalone transformer student while maintaining transformer-level inference efficiency. Ablation studies further validate the importance of moderate frequency decomposition, discrepancy-aware weighting, and progressive distillation scheduling. These results suggest that heterogeneous distillation provides an effective and practical approach for transferring diffusion-based generative priors into efficient super-resolution models.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper