What question did this study set out to answer?

To develop a robust multi-modal detection method for substation equipment using YOLOv11 architecture.

May 7, 2026Open Access

A multi-modal information interaction-based detection method for substation equipment using YOLOv11

Key Points

To develop a robust multi-modal detection method for substation equipment using YOLOv11 architecture.
Proposes a multi-modal detection framework integrating visible and infrared modalities.
Utilizes three novel modules: FIEI for feature extraction, MFSM for feature merging, and CFE for noise suppression.
Evaluates the framework on a self-built multimodal dataset of substations.
Achieves 91.3% mAP@0.5, outperforming single-modal detection methods.
Improves detection accuracy by an average of 10.87% over mainstream image fusion methods.
Demonstrates strong robustness against adverse weather conditions and maintains low computational complexity.

Abstract

• A multi-modal detection framework for substation equipment based on YOLOv11 is proposed. • Three novel modules (FIEI, MFSM, CFE) boost multi-modal feature fusion and enhancement. • The method achieves 91.3 % mAP@0.5, outperforming single-modal and mainstream fusion methods. • Strong robustness under adverse weather and registration errors in real inspection scenarios. • Maintains low computational complexity and high real-time performance for engineering use. Reliable detection and localization of substation equipment under normal operating conditions is paramount for the autonomous inspection of power systems. However, traditional single-modal detection methods often suffer from performance degradation under adverse lighting conditions or complex thermal backgrounds. This paper proposes a robust multi-modal information interaction detection framework based on the state-of-the-art YOLOv11 architecture. To effectively leverage complementary information from visible and infrared modalities, three novel modules are integrated: (1) the Feature Information Extraction and Integration (FIEI) module, designed to capture fine-grained spatial and thermal features; (2) the Multi-modal Feature Shunting and Merging (MFSM) module, which adaptively resolves feature conflicts and synchronizes heterogeneous data; and (3) the Cross-modal Feature Enhancement (CFE) mechanism, which employs attention-based interaction to suppress noise in low-quality images.The experimental results on a self-built multimodal dataset of substations show that the accuracy of the proposed method reaches 91.3 %, which is 15.56 % higher than that of the visible light image detection method and 18.38 % higher than that of the infrared image detection algorithm. Compared with the mainstream image fusion detection methods, the detection accuracy is improved by an average of 10.87 %.While maintaining a relatively low computational complexity, it significantly suppresses the phenomena of missed detection and false detection, showing strong performance for equipment localization and detection in normal operation scenarios.

A multi-modal information interaction-based detection method for substation equipment using YOLOv11

Key Points

Abstract

Cite This Study