The integration of Artificial Intelligence (AI) and the Internet of Things (IoT) has catalyzed the emergence of the Artificial Intelligence of Things (AIoT), which is revolutionizing critical sectors such as energy, industry, and transportation by enabling intelligent and autonomous systems. By 2025, global IoT connections are projected to reach 24.6 billion, with China expected to account for 30% of this growth. However, the Electric Power IoT–a pivotal component of modern infrastructure–faces persistent challenges in ultra-large-scale device connectivity, secure data transmission, and multi-source heterogeneous data governance, particularly under weak network conditions and complex environments. Traditional single-modal analysis methods, including SVMs, Random Forests, CNNs, and LSTMs, struggle to leverage complementary information across diverse sensing modalities or adapt to dynamic real-world noise, limiting their effectiveness in mission-critical power system applications. To address these limitations, we propose HIGHMMT, a High-Modality Multi-Modal Learning framework that integrates time-series signals, audio, images, and text through a hierarchical fusion architecture. Our approach features a Task-Parallel Cross-Modal Feature Transformer (TPCFT) for adaptive feature extraction and a dual-decoder structure for deep multi-modal fusion, aligning with the AIoT paradigm of embedding intelligence at the edge to enable autonomous analytics and decision-making. Experimental results demonstrate a 3.4% improvement in accuracy over single-modal fault detection, highlighting its suitability for resource-constrained IoT environments.
Guan et al. (Thu,) studied this question.