What question did this study set out to answer?

The research aims to develop a multi-modal learning framework to improve fault detection in power sector IoT applications.

February 28, 2026Open Access

HIGHMMT: a multi modal intelligent governance framework for the Internet of Things in the power sector

Key Points

The research aims to develop a multi-modal learning framework to improve fault detection in power sector IoT applications.
Proposed HIGHMMT framework integrates time-series signals, audio, images, and text.
Utilizes Task-Parallel Cross-Modal Feature Transformer for feature extraction.
Employs a dual-decoder structure for deep multi-modal fusion.
Focuses on addressing challenges in device connectivity and data governance.
Demonstrated a 3.4% improvement in accuracy over single-modal methods for fault detection.
Highlighted suitability for resource-constrained IoT environments.
Showed effectiveness in adapting to real-world noise and dynamic scenarios.

Abstract

The integration of Artificial Intelligence (AI) and the Internet of Things (IoT) has catalyzed the emergence of the Artificial Intelligence of Things (AIoT), which is revolutionizing critical sectors such as energy, industry, and transportation by enabling intelligent and autonomous systems. By 2025, global IoT connections are projected to reach 24.6 billion, with China expected to account for 30% of this growth. However, the Electric Power IoT–a pivotal component of modern infrastructure–faces persistent challenges in ultra-large-scale device connectivity, secure data transmission, and multi-source heterogeneous data governance, particularly under weak network conditions and complex environments. Traditional single-modal analysis methods, including SVMs, Random Forests, CNNs, and LSTMs, struggle to leverage complementary information across diverse sensing modalities or adapt to dynamic real-world noise, limiting their effectiveness in mission-critical power system applications. To address these limitations, we propose HIGHMMT, a High-Modality Multi-Modal Learning framework that integrates time-series signals, audio, images, and text through a hierarchical fusion architecture. Our approach features a Task-Parallel Cross-Modal Feature Transformer (TPCFT) for adaptive feature extraction and a dual-decoder structure for deep multi-modal fusion, aligning with the AIoT paradigm of embedding intelligence at the edge to enable autonomous analytics and decision-making. Experimental results demonstrate a 3.4% improvement in accuracy over single-modal fault detection, highlighting its suitability for resource-constrained IoT environments.

Bookmark

View Full Paper

Bookmark

View Full Paper

HIGHMMT: a multi modal intelligent governance framework for the Internet of Things in the power sector

Key Points

Abstract

Cite This Study