What question did this study set out to answer?

The research aims to enhance rural settlement delineation from medium-resolution remote sensing imagery using a novel segmentation model.

February 14, 2026Open Access

Rural Settlement Segmentation in Large-Scale Remote Sensing Imagery Using MSF-AL Auto-Labeling and the SELPFormer Model

Key Points

The research aims to enhance rural settlement delineation from medium-resolution remote sensing imagery using a novel segmentation model.
Developed SELPFormer, a lightweight Transformer-based model for semantic segmentation.
Created a multi-source fusion pipeline integrating Global Impervious Surface Dynamics, OpenStreetMap, and nighttime lights.
Employed pyramid pooling and attention mechanisms to improve feature extraction and reduce background noise.
Evaluated performance using Landsat multispectral imagery across five provinces.
Achieved IoU of 74.23%, mIoU of 86.43%, and F1 score of 85.21%.
Demonstrated improved segmentation performance with gains over other models: +1.44 for SegFormer, +3.98 for U-Net, +12.35 for DeepLabV3+.
Achieved a parameter efficiency of 3.93% IoU per million parameters and a ROC-AUC of 0.993, indicating high discriminative power.

Abstract

Accurate delineation of rural settlements at large spatial extents is fundamental to territorial spatial governance, rural revitalization, and the improvement of human living environments. However, in medium-resolution remote sensing imagery, rural settlement patches are typically small, morphologically complex, and easily confused with other impervious surfaces. As a result, existing products still fall short in characterizing these features. Here, we propose a lightweight Transformer-based semantic segmentation model, SELPFormer, and develop a multi-source fusion automatic labeling pipeline that integrates Global Impervious Surface Dynamics dataset, OpenStreetMap spatial priors, and nighttime lights constraints. Built upon SegFormer as the backbone, SELPFormer introduces a lightweight pyramid pooling module at the deepest feature level to aggregate multi-scale global context and embeds an SCSE channel–spatial attention mechanism into deep features to suppress background interference. In addition, it incorporates an efficient local attention module into multi-scale lateral connections to enhance boundary and texture representations, thereby jointly improving small-object recognition and fine boundary preservation. We evaluate the proposed method using Landsat multispectral imagery covering five provinces on the North China Plain. SELPFormer achieves IoU = 74.23%, mIoU = 86.43%, F1 = 85.21%, OA = 98.69%, and Kappa = 0.8452 under a unified training and evaluation protocol, yielding IoU gains of +1.44, +3.98, and +12.35 percentage points over SegFormer, U-Net, and DeepLabV3+, respectively. SELPFormer has 15.44 M parameters and attains a parameter efficiency of 3.93% IoU per million parameters and an ROC-AUC of 0.993, indicating strong threshold-independent discriminative capability. These results indicate that the proposed method can effectively extract rural settlements from medium-resolution imagery and provides a generic “global–channel–local” collaborative framework for model design and data construction.

Bookmark

View Full Paper

Cite This Study

Zhou et al. (Thu,) studied this question.

synapsesocial.com/papers/699010df2ccff479cfe572b4 https://doi.org/https://doi.org/10.3390/rs18040579

Bookmark

View Full Paper