Accurate delineation of rural settlements at large spatial extents is fundamental to territorial spatial governance, rural revitalization, and the improvement of human living environments. However, in medium-resolution remote sensing imagery, rural settlement patches are typically small, morphologically complex, and easily confused with other impervious surfaces. As a result, existing products still fall short in characterizing these features. Here, we propose a lightweight Transformer-based semantic segmentation model, SELPFormer, and develop a multi-source fusion automatic labeling pipeline that integrates Global Impervious Surface Dynamics dataset, OpenStreetMap spatial priors, and nighttime lights constraints. Built upon SegFormer as the backbone, SELPFormer introduces a lightweight pyramid pooling module at the deepest feature level to aggregate multi-scale global context and embeds an SCSE channel–spatial attention mechanism into deep features to suppress background interference. In addition, it incorporates an efficient local attention module into multi-scale lateral connections to enhance boundary and texture representations, thereby jointly improving small-object recognition and fine boundary preservation. We evaluate the proposed method using Landsat multispectral imagery covering five provinces on the North China Plain. SELPFormer achieves IoU = 74.23%, mIoU = 86.43%, F1 = 85.21%, OA = 98.69%, and Kappa = 0.8452 under a unified training and evaluation protocol, yielding IoU gains of +1.44, +3.98, and +12.35 percentage points over SegFormer, U-Net, and DeepLabV3+, respectively. SELPFormer has 15.44 M parameters and attains a parameter efficiency of 3.93% IoU per million parameters and an ROC-AUC of 0.993, indicating strong threshold-independent discriminative capability. These results indicate that the proposed method can effectively extract rural settlements from medium-resolution imagery and provides a generic “global–channel–local” collaborative framework for model design and data construction.
Zhou et al. (Thu,) studied this question.