This paper proposes a Dual-Stream Cross-Scale Interactive Diagnosis Network (DCID-Net) for intelligent fault detection and segmentation of substation equipment. Unlike traditional sequential combinations of You Only Look Once (YOLO) and Mask R-CNN, DCID-Net establishes a collaborative dual-stream framework, where an improved YOLO-based Global Perception Stream (GPS) and a Mask R-CNN-inspired Local Refinement Stream (LRS) operate in parallel and exchange features through a Cross-Scale Interactive Fusion Module (CIFM). The model jointly optimizes detection, segmentation, and inter-stream consistency via a multi-objective loss and employs a dual-phase inference strategy for adaptive accuracy-speed balance between edge and cloud. Experiments on the CPLID dataset demonstrate that DCID-Net achieves 96.8% mAP and 94.8% mIoU, surpassing YOLOv5, Mask R-CNN, and Cascade R-CNN in precision, boundary fidelity, and robustness under challenging illumination and occlusion conditions.
Zhang et al. (Thu,) studied this question.