Fish disease and species identification is critical for intelligent aquaculture, directly influencing productivity, sustainability, and economic viability. However, existing approaches largely treat species identification and pathological classification as independent tasks, limiting their ability to capture interdependent features under complex real-world conditions such as occlusion, low contrast, dynamic backgrounds, and high inter-class similarity. Moreover, challenges including class imbalance, cross-species variability, and fine-grained feature discrimination remain insufficiently addressed. To overcome these limitations, this paper proposes a hybrid ConvNeXt–BiLSTM–multi-head self-attention (MHSA) framework for joint fish species and disease classification, where a ConvNeXt-Small backbone extracts hierarchical spatial features that are transformed into a structured sequence and processed by a bidirectional LSTM to capture contextual dependencies, followed by an MHSA module for adaptive feature refinement. An auxiliary species classification branch is incorporated to provide multi-task regularization without additional inference costs. The training pipeline integrates CLAHE-based image enhancement, square-root inverse-frequency focal loss, targeted minority oversampling, and a two-stage progressive learning strategy with differential-rate cosine annealing, complemented by five-view test-time augmentation. For practical deployment, a YOLOv8s detector is employed for fish localization prior to classification. The experimental results demonstrate that the proposed model achieves superior performance, attaining overall top-1 classification accuracy of 94.33%, precision of 97.1%, recall of 90.9%, 96.1% mAP50, and an F1-score of 0.9264, while achieving a macro AUC of 0.994 and maintaining high computational efficiency (213.3 FPS), demonstrating a robust and efficient solution for real-time fish disease screening.
Ahmad et al. (Sat,) studied this question.