ABSTRACT Recent studies have shown that, in multivariate long‐term time‐series forecasting, linear models employing a channel‐independent (CI) strategy tend to outperform Transformer‐based models that do not explicitly model cross‐channel interactions, including FEDformer, Autoformer, and Informer. This finding has cast doubt on the ability of the Transformer's attention mechanism to capture temporal dependencies effectively. To address this issue, the Transformer‐based PatchTST model also adopts the CI strategy. However, follow‐up work has revealed that CI models suffer from the drawback of “spatial indistinguishability”; that is, they collapse several variables into an identical forecast whenever the past observations of those variables look alike, even if the variables later evolve in opposite directions. To overcome this limitation, we attach learnable variable identifiers to the embeddings of multivariate time series, enabling the model to differentiate individual variables. We further observe that these identifiers can be leveraged to analyze inter‐variable similarities. Additionally, we employ multi‐scale CNNs in the shallow layers to extract rich local temporal features while reducing computational overhead, and a Transformer sub‐network to capture long‐term dependencies. By combining the strengths of both components, the model extracts both local and global structures inherent in multivariate time series. Moreover, to enhance robustness against noise and outliers and to mitigate overfitting, we introduce a novel loss function that integrates the advantages of mean squared error (MSE) and mean absolute error (MAE). Extensive experiments on six widely used open‐source datasets demonstrate that our model consistently outperforms the Transformer‐based baseline (PatchTST), achieving a maximum relative improvement of 4.4%.
Chen et al. (Wed,) studied this question.