ABSTRACT Soil degradation significantly impacts global agricultural productivity, making accurate soil classification crucial for sustainable land management. To address limitations of traditional surveys and computationally intensive models, we propose a lightweight Vision Transformer integrating TinyViT's hierarchical compression with DropKey's dynamic sparse attention. This architecture enables high‐accuracy soil classification suitable for edge computing. A hybrid data augmentation strategy effectively mitigates class imbalance in soil datasets. Evaluation on 1915 soil images shows our model achieves 98.42% accuracy with only 5.40 M parameters and 1.19G FLOPs, outperforming mainstream counterparts. While limited by dataset scale and controlled lighting, DropKey proves particularly effective for soil texture analysis by filtering noise while preserving critical textural boundaries. This framework effectively balances accuracy and efficiency, thereby facilitating deployment on resource‐constrained devices.
Zhang et al. (Mon,) studied this question.