Aerial image segmentation typically focuses on broad categories rather than fine details. Besides, objects of the same class often exhibit large variations in texture and scale, while long-range pixel dependencies further complicate segmentation. To address these challenges, we propose VFM-CAKD, a category-aware knowledge distillation method that leverages strong Vision Foundation Models (VFMs) to enhance lightweight segmentation models. First, we introduce learnable categoryaware vectors to filter redundant interference in VFM features, generating a Category-Aware Guidance (CAG) token. Next, to handle intra-class variations, we design a Category-Aware-Guided Dual-Scale Feature Fusion (CAG-DSFusion) strategy, which enhances feature fusion by activating category-aware components, producing more robust category prototypes for effective teacher-student knowledge transferring. Additionally, we propose a Category-Aware-Guided Global Attention module (CAG-Global) to align global feature representations between teacher and student networks, facilitating the transferring of longrange dependency modeling. Experiments on drone-view datasets (Aeroscapes and UAVid) and aerial remote sensing datasets (Potsdam and Vaihingen) demonstrate that VFM-CAKD effectively transfers VFM knowledge to improve student model performance. The code is avaliable at https://github.com/aresdrw/VFM-CAKD.
Wang et al. (Fri,) studied this question.