What question did this study set out to answer?

The aim is to develop an interpretable model for power load forecasting that addresses existing limitations.

March 7, 2026Open Access

KANformer: A Flexible Kolmogorov-Arnold Transformer for Power Load Forecasting

Key Points

The aim is to develop an interpretable model for power load forecasting that addresses existing limitations.
Proposed KANformer architecture as the backbone for forecasting.
Transformed forecasting into a language modeling task using patching technology.
Implemented autoregressive optimization to improve temporal dependency modeling.
Tested on two real-world power grid load datasets for validation.
KANformer exhibited superior performance compared to traditional models.
Achieved better generalization ability across different datasets.
Successfully captured complex variation patterns in power load time-series.

Abstract

With economic growth and improving living standards, electricity demand becomes more complex and volatile. As a key part of power system planning, operation, and management, power load forecasting is of great importance. Accurate forecasting enables grid dispatching departments to make reasonable generation plans and schedule equipment maintenance in advance. However, there are still exist two issues in current power load forecasting methods: (1) Current methods commonly utilize multilayer perceptrons to construct the overall network which is extremely difficult to interpret how these models arrive at specific predictions. (2) They commonly utilize the one-step generation paradigm with a customized forecasting head. Such a manner ignores the temporal dependencies in the forecasting series and needs to train separately for different prediction lengths. To this end, a novel interpretable Kolmogorov-Arnold networks (KAN)-based Transformer architecture (KANformer)is proposed as the backbone of the model to capture variation patterns of power load time-series data. Specifically, KANformer transforms the forecasting task into a standard language modeling task. It uses patching technology to project time series into patch-based representations. During training, an autoregressive optimization function replaces the traditional single-step generation scheme. This allows the model to effectively model the temporal dependencies within the prediction range at the patch level through autoregressive inference. It can also seamlessly adapt to various power grid load datasets with different prediction settings without any modifications. Experimental results on two real-world power grid load datasets show that KANformer has superior performance and generalization ability.

KANformer: A Flexible Kolmogorov-Arnold Transformer for Power Load Forecasting

Key Points

Abstract

Cite This Study