To address the challenge of balancing computational efficiency with fine-grained feature capture in complex field environments when using existing deep learning methods for tomato leaf disease detection, this paper proposes a novel lightweight classification model called Visual Mamba with Frequency-channel attention, Cross-layer attention and Salient feature suppression (VMamba-FCS). Based on the visual state-space model, this model integrates three collaborative enhancement mechanisms: a frequency-domain channel attention module, which improves the perception of disease-related textures by recalibrating features in the frequency domain; a cross-layer attention module, which promotes multi-scale feature fusion by integrating the semantic context of early layers; and a salient feature suppression module, which forces the network to learn more comprehensive discriminative features to improve robustness by suppressing overactivated feature regions during training. Experimental results on the real-world field dataset “Tomato-Village” demonstrate that VMamba-FCS achieves a classification accuracy of 93.62% and an inference speed of 126.5 frames per second (FPS) with only 1.20 M parameters, representing a 7.48% improvement in accuracy compared to the basic VMamba model. In the cross-dataset (PlantDoc) generalization test, VMamba-FCS significantly outperformed all comparison models with an accuracy of 71.3%, demonstrating its excellent domain adaptability and robustness. This work verifies the effectiveness of the multi-mechanism collaborative enhancement strategy in the state-space model architecture, providing a new lightweight solution for real-time and accurate agricultural disease detection on resource-constrained edge devices.
Liu et al. (Wed,) studied this question.