What question did this study set out to answer?

The aim is to improve long-term forecasting accuracy in multivariate time series by addressing limitations in existing models.

January 25, 2026

A Universal Multivariate Long‐Term Time‐Series Robust Forecasting Model With Distinguishable Variable Identifier

Key Points

The aim is to improve long-term forecasting accuracy in multivariate time series by addressing limitations in existing models.
Developed a multivariate forecasting model with learnable variable identifiers
Utilized multi-scale CNNs for local feature extraction
Employed a Transformer sub-network for capturing long-term dependencies
Introduced a new loss function combining MSE and MAE to enhance robustness
Achieved up to 4.4% relative improvement over Transformer-based baseline models
Improved differentiation of individual variable forecasts
Enhanced handling of noise and outliers compared to existing models

Abstract

ABSTRACT Recent studies have shown that, in multivariate long‐term time‐series forecasting, linear models employing a channel‐independent (CI) strategy tend to outperform Transformer‐based models that do not explicitly model cross‐channel interactions, including FEDformer, Autoformer, and Informer. This finding has cast doubt on the ability of the Transformer's attention mechanism to capture temporal dependencies effectively. To address this issue, the Transformer‐based PatchTST model also adopts the CI strategy. However, follow‐up work has revealed that CI models suffer from the drawback of “spatial indistinguishability”; that is, they collapse several variables into an identical forecast whenever the past observations of those variables look alike, even if the variables later evolve in opposite directions. To overcome this limitation, we attach learnable variable identifiers to the embeddings of multivariate time series, enabling the model to differentiate individual variables. We further observe that these identifiers can be leveraged to analyze inter‐variable similarities. Additionally, we employ multi‐scale CNNs in the shallow layers to extract rich local temporal features while reducing computational overhead, and a Transformer sub‐network to capture long‐term dependencies. By combining the strengths of both components, the model extracts both local and global structures inherent in multivariate time series. Moreover, to enhance robustness against noise and outliers and to mitigate overfitting, we introduce a novel loss function that integrates the advantages of mean squared error (MSE) and mean absolute error (MAE). Extensive experiments on six widely used open‐source datasets demonstrate that our model consistently outperforms the Transformer‐based baseline (PatchTST), achieving a maximum relative improvement of 4.4%.

Bookmark

Cite This Study

Chen et al. (Wed,) studied this question.

synapsesocial.com/papers/6975b24dfeba4585c2d6dbc6 https://doi.org/https://doi.org/10.1002/for.70105

Bookmark