Knowledge distillation (KD) has been successfully applied to dense prediction tasks, and mainstream methods typically boost the student via spatial imitation losses. However, the consecutive downsamplings (e.g., existing in feature pyramid network of detectors) induced in the spatial domain are a type of distortion, hindering the student from analyzing what specific information needs to be imitated, which results in accuracy degradation. To better understand the underlying pattern of corrupted feature maps, we shift our attention to frequency knowledge distillation and propose FreeKD, which determines the optimal localization and extent for the frequency distillation. (1) Frequency Prompts in FreeKD plug into the teacher model, absorbing the semantic frequency context during finetuning. During the distillation period, a pixel-wise frequency mask is generated via Frequency Prompt, to localize those pixels of interest in various frequency bands. (2) A position-aware relational frequency loss is for dense prediction tasks, delivering a high-order spatial enhancement to the student model. While a single distillation loss might not adequately capture both high- and low-frequency signals, especially given their contextual nuances, we enhance FreeKD by introducing a frequency-decoupled strategy. This approach emphasizes relaxed alignment in the high-frequency domain and enforces stronger alignment for low-frequency features. Additionally, we refine the frequency masks by reconstructing the regions of interest based on the student's knowledge, thereby optimizing the distillation process. Extensive experimental results on the widely used COCO, VisDrone, Cityscapes, ADE20K, and COCO-C datasets demonstrate the effectiveness of our proposed framework FreeKD+.
Zhang et al. (Thu,) studied this question.